<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Journal of the Franklin Institute 352 (2015)
3122-3143. URL: https://www.sciencedirect.com/science/article/pii/S001600321500174X.
doi:https://doi.org/10.1016/j.jfranklin.2015.04.014</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Aggregation in Locomotor Multi-Agent Systems: a Theoretical Framework</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Paolo Pagliuca</string-name>
          <email>paolo.pagliuca@istc.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandra Vitanza</string-name>
          <email>alessandra.vitanza@istc.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Cognitive Sciences and Technologies, National Research Council (CNR-ISTC)</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>2</volume>
      <fpage>8</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>The synthesis of collective behaviors in Multi-Agent Systems is typically approached using various methods, with Evolutionary Algorithms being among the most prevalent. In these systems, agents engage in local interactions with their peers and collectively adopt strategies that manifest at a group level, resembling social behaviors seen in animal societies. We extended the AntBullet problem, which is part of the PyBullet simulation tool, to a collective scenario involving a group of five homogeneous robots to aggregate during locomotion. To evolve this behavior, we employed the OpenAI-ES algorithm alongside a multi-objective fitness function. Our findings indicate that while the robots developed successful locomotion behaviors, they did not exhibit aggregation. This discrepancy is attributed to design choices that unintentionally emphasized locomotion over aggregation capabilities. We discuss the dynamic interplay induced by the fitness function to validate our results and outline future directions. Ultimately, our goal is a first attempt to establish a framework for analyzing collective behaviors using advanced algorithms within modern simulation environments.</p>
      </abstract>
      <kwd-group>
        <kwd>Multi-Agent Systems</kwd>
        <kwd>aggregation</kwd>
        <kwd>fitness function</kwd>
        <kwd>OpenAI-ES</kwd>
        <kwd>PyBullet</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        In Multi-Agent Systems (MASs) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], various approaches based on model-free machine learning
techniques are employed to address the emergence of collective behaviors. Reinforcement
Learning (RL) and Evolutionary Algorithms (EAs) are among the most widespread methods
used for this purpose. Common characteristics of such systems include the exploitation of
local interactions between agents to identify a common strategy beneficial at the group level,
mimicking behaviors observed in social animals such as ants, bees, birds, and fish. Recently, in
the simulation panorama, the PyBullet simulation tool [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] has become a standard for customizing
environments and testing algorithms like Evolutionary Strategies (ESs) and Reinforcement
Learning. Indeed, PyBullet ofers a wide range of problems, from classic control tasks to Atari
games and locomotion challenges. In this work, we propose a novel framework to investigate
collective behaviors in swarms, focusing on the aggregation problem — a common behavior
observed in nature [
        <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6">3–6</xref>
        ] — that is utilized for tasks such as foraging, collective motion, and
defense against predators. Specifically, we extend the AntBullet problem, a standard benchmark
in PyBullet, to a collective scenario involving a group of five homogeneous robots with the
ultimate goal to aggregate while locomoting. We define a fitness function that rewards agents
for both capabilities and use the OpenAI-ES algorithm to evolve such behavior. Our outcomes
indicate that the robots evolve a successful locomotion behavior, but do not aggregate. Truly,
the fitness function has been designed in a way that achieving a positive reward for aggregation
conflicts with the locomotion reward, which is considerably easier to obtain. Consequently,
agents behave as individuals living in the same environment but acting egoistically. However,
this work represents a first step towards creating a setup that enables the analysis of collective
behaviors using modern algorithms in advanced simulation environments.
      </p>
      <p>After a brief introduction about the common techniques and tools adopted in literature
(Subsections 1.1 and 1.2), the remaining part of the manuscript will cover the following: an
overview of related works to set the relative background (Section 2), followed by the presentation
of the theoretical framework (Section 3) and a description of the experimental setup (Section 4).
Subsequently, the obtained results will be presented (Section 5), and finally, the discussion and
conclusions will be drawn (Section 6).</p>
      <sec id="sec-2-1">
        <title>1.1. Model-free machine learning methods for MASs</title>
        <p>Model-free machine learning methods are techniques used in the field of machine learning,
where the learning algorithm does not explicitly create or maintain a model of the underlying
data distribution. Instead, these methods focus on learning directly from the data through trial
and error, often utilizing feedback signals such as rewards or penalties. Therefore, Reinforcement
Learning (RL) is a major example of this approach. In this sense, Evolutionary Strategies (ESs)
fall under this category because they use a fitness function to guide the optimization process
without explicitly modeling the underlying dynamics of the environment. Truly, Evolutionary
Algorithms (EAs), including ESs, can be classified as model-free optimization methods rather
than traditional model-free learning methods.</p>
        <p>Model-free methods are instrumental in scenarios involving complex, highly dynamic, or
poorly understood systems, situations that make it challenging to construct an accurate model.
These methods excel in tasks such as robotic control, game playing, and autonomous
decisionmaking in uncertain environments. Specifically, the decentralized approaches intrinsic to
model-free learning methods are well-suited for MASs, where agents must adapt and coordinate
with limited information. Indeed, these approaches provide a powerful framework for learning
adaptive behaviors in MASs, fostering the emergence of collective intelligence and enabling
agents to navigate dynamic and complex environments efectively. Definitely, applying
modelfree machine learning in MASs represents opportunities for developing novel algorithms and
techniques tailored to multi-agent settings.</p>
      </sec>
      <sec id="sec-2-2">
        <title>1.2. Physics Engines and Simulation Platforms</title>
        <p>
          Physics engines are essential tools for simulating robots and evaluating their performance across
various tasks. There are several physics engines and simulation platforms used for simulating
robot scenarios. Among these physics engines, we can mention: (i) ODE (Open Dynamics
Engine) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], the most famous open-source and high-performance library for simulating physics
dynamics. It focuses on real-time simulation and is capable of multi-agent simulation. Widely
used in robotics, games, and simulation environments, it is commonly used in research projects
[
          <xref ref-type="bibr" rid="ref10 ref8 ref9">8–10</xref>
          ]. (ii) PhysX [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], a robust physics engine developed by NVIDIA, known for its accuracy
and performance in simulating physical interactions. It supports real-time simulations and GPU
acceleration, making it popular in game development and high-fidelity simulations in robotics
scenarios [
          <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
          ]. Finally, (iii) Bullet Physics [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], the open-source physics engine chosen
for this work. It includes support for collision detection, and rigid and soft body dynamics,
and is suitable for simulating complex robotic scenarios. These physics engines provide a
robust foundation for implementing simulation platforms, each ofering unique features and
capabilities useful for studying swarm robotics scenarios. Notable simulation platforms include:
• Gazebo [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] is a powerful open-source 3D robotics simulator widely used in research
and development. It is a physics-realistic simulator built on ODE, although it recently
supports multiple physics engines (including Bullet and others) and integrates with ROS
(Robot Operating System) [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Due to its features, it is ideal for simulating complex
environments with multiple robots, including swarm robotics.
• CoppeliaSim (formerly V-REP) [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] is a versatile robot simulation software. It supports
a wide range of robots and environments. The simulator ofers four physics engines (i.e.,
Bullet Physics, ODE, Newton, and Vortex Dynamics) and is frequently used for swarm
robotics due to its flexibility and ease of creating complex interactions and behaviors.
• Webots [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] is an open-source robot simulation software that allows the modeling,
programming, and simulation of mobile robots built on ODE. It is excellent for educational
purposes and research in swarm robotics, supporting realistic simulations and prototyping.
• ARGoS (Autonomous Robots Go Swarming) [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] is a simulation framework
specifically designed for swarm robotics. Its most important features are its high eficiency,
capability of simulating large-scale robot groups, and customization of both the robots and
the environment. It was primarily designed for research in swarm robotics to facilitate
experiments and results acquisition.
• FARSA (Framework for Autonomous Robotics Simulation Applications) [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] is
a simulation framework specifically designed for studying and developing autonomous
robots, with a particular focus on swarm robotics. It is particularly suited for researchers
who develop and test algorithms for collective intelligence and autonomous
decisionmaking in robotic systems. The major features of FARSA are its high modularity and
extensibility, which allow users to extend and customize the simulation framework
to specific needs. FARSA supports the rapid creation of custom robots and simulates
complex environments in which the robots operate. FARSA is supported by a community
of researchers and developers, contributing to its continuous improvement and the
availability of shared resources.
1.2.1. PyBullet and AntBullet problem
PyBullet [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] is a physics simulator built on top of the Bullet physics engine [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. Developed
primarily in C++, it was designed with a focus on robotics manipulation, enabling the simulation
of articulated rigid bodies, physical joints, and constraints. It also integrates learning algorithms
and supports a range of features including various sensors, gripper and multi-agent simulation,
and lightweight graphics rendering. Despite its advantages, PyBullet faces challenges such as
long simulation times and complexity in setting up the environment. However, its versatility
and open-source nature make it a valuable tool in robotics simulation and research. PyBullet
is easy to use with zero overhead when integrating a physics engine into a Python program.
Moreover, it is tailored to encourage and facilitate the use of constraint-based descriptions
to abstract physics for modern robotic learning algorithms. Therefore, it can be used with
any machine-learning technique that supports PyTorch [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], SimPy [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], in pure Python or
conjunction with CUDA [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ].
        </p>
        <p>
          AntBullet [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] is an environmental simulation implemented in PyBullet where an ant robot
with 8 joints is simulated (AntBulletEnv). It serves as a benchmark for reinforcement-learning
simulations [
          <xref ref-type="bibr" rid="ref24 ref25 ref26 ref27">24–28</xref>
          ]. The goal is to reach a specified end position in the fewest steps possible.
In the context of benchmarking, the AntBullet environment is used to evaluate and compare
the performance of various algorithms. Generally, the goal is to optimize the movements of the
Ant robot, enabling it to perform efectively in the simulation. Therefore, AntBullet requires
minimal modifications to enable swarm-based robotics.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. Related works</title>
      <p>
        The background of the present proposal draws direct inspiration from [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. In this work, the
authors describe an experiment comparing the eficacy of various neuro-evolutionary strategies
for continuous control optimization. They discuss diferent algorithms, including OpenAI-ES
[29], CMA-ES (Covariance Matrix Adaptation Evolution Strategy) [30], sNES (Separable Natural
Evolutionary Strategy)[31], and xNES (Exponential Natural Evolution Strategy) [31], analyzing
their performance across a range of tasks. The experiment utilized a variety of benchmark
problems, such as locomotion tasks, Atari games, the double-pole balancing problem [32] and
a swarm foraging scenario introduced in [33]. The algorithms were evaluated based on their
ability to maximize a predefined reward function. Algorithm performance was assessed by the
total number of evaluations required to find a solution.
      </p>
      <p>The OpenAI-ES algorithm consistently outperformed other algorithms across all tested
problems, demonstrating robustness to changes in hyper-parameters. Results indicate that the
success of this method is attributed to the Adam optimizer [34] rather than the virtual batch
normalization technique [29, 35]. Furthermore, the efectiveness of the OpenAI-ES algorithm
also in achieving collective decision-making for aggregation tasks is demonstrated by the
results of our recent study [36]. In that work, we assessed the eficacy of the method both
quantitatively, through a performance analysis, and qualitatively by evaluating the emergent
behavior in diferent environmental multi-agent setups.</p>
      <p>Similar analyses were conducted in [37], where two evolutionary algorithms, CMA-ES and
xNES, were compared in the context of a group of robots making collective decisions by means
of aggregation behaviors. The aim of the study was to determine to what extent the performance
and the distribution of the robots was afected by the environmental conditions (i.e., dimensions
of the sites), and to evaluate the final aggregation of the swarm.</p>
      <p>
        In particular, regarding the AntBullet problem [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], which represents the starting point of
this study, the OpenAI-ES evolutionary strategy outperforms the PPO (Proximal-Policy
Optimization) reinforcement learning algorithm [38], despite encountering some issues with the
reward function. Specifically, achieving efective behaviors and optimal performance across all
replications requires a small adjustment in the reward function, typically a bonus or punishment
of approximately ±0.01. Primarily, the authors recommend that future comparisons between
evolutionary and reinforcement learning algorithms should incorporate reward functions
specifically tailored to each algorithm class. This emphasis underscores the importance of utilizing
appropriate reward functions, as evidenced by the fact that reward functions designed for
reinforcement learning may not necessarily be efective for evolutionary strategies and vice
versa. Therefore, the study highlights the significant impact of the employed reward function
on algorithm performance (for a review of the fitness/reward functions used for Evolutionary
Algorithms, see [39]). In this context, Table 1 summarizes some interesting works by comparing
the algorithms used, the type of task, the defined fitness function, and the homogeneity of the
group. It serves to place our proposal in the context of the state-of-the-art.
evaluate score:   + ←  (  +  ×   )
evaluate score:   − ←  (  −  ×   )
end for
compute normalized ranks:  ←  (), 
estimate gradient:   ← 1
      </p>
      <p>∑</p>
      <p>=1   ×  
 +1 ←   +  (

)</p>
    </sec>
    <sec id="sec-4">
      <title>3. Evolutionary Strategy for Aggregation</title>
      <p>
        Intending to investigate the possibility of synthesizing a successful aggregation behavior in a
swarm of AntBullet robots, we employed the OpenAI-ES algorithm [
        <xref ref-type="bibr" rid="ref26">26, 29, 54</xref>
        ] (for a description,
see Alg. 1), which has been adopted to evolve aggregation in similar settings [36, 55]. Specifically,
we considered a swarm of five homogeneous AntBullet robots, controlled through a feed-forward
neural network with 28 inputs, 50 internal units and 8 outputs. The list of inputs and outputs of
the controller is reported in Table 2.
      </p>
      <p>
        As we pointed out in Section 1.2.1, the AntBullet robot aims to locomote towards a target
direction. The fitness function used to evolve such behavior is defined in [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] and can be
expressed by Eq. 1:
where the components  ,  and  are computed as follows:
      </p>
      <p>=  + 0.01 +  + 
 = ‖

−</p>
      <p>‖</p>
      <p>indicates the current position; 
denotes the previous position; ‖⋅‖ is
the norm operator;   represents the number of motors;   is the value of the  − ℎ
motor;  
indicates the number of joints at limit.</p>
      <p>Starting from   , we extended the fitness function to deal with our collective scenario, in
which the robots must aggregate while locomoting. In particular, we computed the components
List of input and output data to the robot’s neural network controller. Concerning the inputs, the
symbols are defined as follows:  denotes the z coordinate (i.e., height) of the agent;  
indicates
the initial z coordinate; 
 _</p>
      <p>represents the relative angle with the target location;   ,   and
  indicate the robot velocities along the three axes; 
and ℎ
are the roll and pitch of the robot;
(
the  − ℎ
 ,    ) represents the position and velocity of the  − ℎ joint;  
_
 denotes the contact flag of
foot (i.e., 1 if the foot touches the ground, 0 otherwise). For outputs, the symbol   represents
the motor value applied to the  − ℎ joint.</p>
      <p>Id
0
1
2
3
4
5
6
7
8–23</p>
      <p>Id
0–7
sin(
cos(
 _
 _
Input
 −  

ℎ
)   ∈  
_ /ℎ</p>
      <p>Output
(
 ,   
    ∈  
)
)</p>
      <p>_
_ , 
_
24–27   _
   ∈ [ 
_ /ℎ
_ ]
 ,  and  for each of the five agents (i.e.,   ,   and   for the generic  − ℎ agent), and we
introduced a new component  rewarding agents for the capability of staying at a target
distance (set to 1.5m) from the other mates. Given an agent  , the component can be defined as
follows:
(5)
(6)
  =  −100∗ −11</p>
      <p>−1
∑=1
|
where  is the number of agents (here  = 5 ),</p>
      <p>denotes the distance between the agents  and  .</p>
      <p>From Eq. 5, it is evident that robots very far from their mates will not receive any score.
Similarly, collisions between peers (i.e., distance below 
) are discouraged since they
prevent agents from locomoting. The  component is intended to foster aggregation in the
swarm during the motion. Accordingly, the following equation defines the resulting fitness
function for the multi-agent scenario:</p>
      <p>1
 =1
 =</p>
      <p>∑   +   +   +  
where the single components refer to the generic agent  . We remove the bonus of 0.01 to
prevent local minima behaviors, such as staying still.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Experimental Setup</title>
      <p>
        We extended the AntBullet problem [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to deal with a collective scenario involving a group of 5
robots. The starting locations of the agents remain almost constant, and the initial formation of
the swarm is a cross, as shown in Fig. 1. A small amount of noise is added to the joints’ initial
positions to make the problem stochastic (parameter input noise in Table 3).
      </p>
      <p>The experiment has been repeated 30 times. Evolution lasts 5 × 107 evaluation steps. The
ability of the swarm to cope with the problem is estimated by evaluating the group in a single
episode lasting up to 1000 steps. Moreover, the generalization capability of the swarm is
computed over 3 post-evaluation episodes, each lasting a maximum of 1000 steps. An evaluation
episode is prematurely stopped if at least one of the agents falls on the ground. The full list of
evolutionary parameters is provided in Table 3.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Results and Analysis</title>
      <p>In this section, we report the outcomes of our experiments. The average fitness (   ) obtained
in 30 replications of the experiment is 1769.389, with a standard deviation ( ) of 419.417.
The best replication achieved a fitness score of 2551.119, while the worst replication obtained a
iftness value of 862.517. Overall, 16 out of 30 replications (i.e., 53.33%) got a fitness value over
average (see Fig. 2).</p>
      <p>If we analyze the impact of the  ,  ,  and  components of the fitness function  , we observe
that the score is mostly due to the first one (see Fig. 3). In fact, progressing toward a target ( )
is easier and does not depend on the behavioral capabilities of the mates. Conversely, reducing
the distance from the peers ( ) — i.e. aggregating — implies moving towards other mates, thus
decreasing the reward provided by the  component. However, the  component rewards agents
Evolutionary parameters. The symbols  ,  and  have the same meaning as indicated in Alg. 1. The 
symbol denotes the uniform distribution.</p>
      <p>Parameter
# of replications
# of evaluation steps ( )
# of symmetric samples ( )</p>
      <p># of robots
# of episodes
# of post-evaluation episodes
# of episode steps
# of hidden neurons
activation function (hidden neurons)
activation function (output neurons)
input noise
output noise</p>
      <p>Value</p>
      <p>30
5 × 107
20
only if the mutual distance with their peers is very close to the target distance 
Eq. 5). This explains why it does not play any role in this context. Similarly, the  component
has almost no impact on  , that is the OpenAI-ES algorithm synthesizes solutions not applying
replications of the experiment. The horizontal line marks the average fitness achieved.
strong values to the motors. Finally, the  component provides a punishment of around 0.25
per step during the evaluation episode. This means that the AntBullet robots typically push
the joints of one of the four legs at their limits. This strategy is helpful to keep balance on the
ground (see Fig. 4).</p>
      <p>0
200
400
600
800</p>
      <p>1000</p>
      <p>Steps</p>
      <p>Overall, our outcomes demonstrate that the fitness function  makes the robots act egoistically,
since the two objectives — locomote and aggregate — conflict with each other. This strategy is
spontaneously discovered by the OpenAI-ES algorithm, which attempts to optimize  . A similar
ifnding can be seen in [ 56], where no global incentive to cooperate or compete is given to the
group, and the chosen fitness does not drive agents to a preferred solution. In this case, the
evolutionary strategy evolves agents that implicitly include selfish sub-tasks if the coordination
within a team proves complex. Indeed, in this way, they can adapt to situational circumstances
and may change their propensity depending on the specific situation.</p>
      <p>Moreover, the individualistic inclination of the agents can be further explained by considering
that the group is formed by homogeneous robots, which share the same network controller and
the same physical embodiment. However, in [53] the authors show how the group dynamics
in heterogeneous MASs are influenced by the diferent skills of the agents, which are more
adaptive and behave according to the particular mates they are evaluated with.</p>
      <p>40
0
40
8
6
4
2 y
0
2
4
2
0
2 y
4
6
8
(c) (d)
Figure 5: Examples of trajectories performed by the agents of the swarm. The behavioral strategies of
the robots evolved in the replications that obtained a fitness score  &gt;   +  : S6 (a), S13 (b), S14
(c) and S20 (d) (see Fig. 3 for details) were analyzed. The robots usually move by maintaining the initial
cross formation throughout the evaluation episode (e.g., (a), (b), (d)). In some cases, the pressure to stay
closer emerges during the motion, with some of the agents moving close and even crossing their paths
(e.g., (c)). Nevertheless, the overall mutual distance between the group’s members is not suficient to get
a score from the  component (see Fig. 3). This explains why the  component of the fitness function
plays a more relevant role than the  component.</p>
      <p>Fig. 5 shows the behavioral strategies exhibited by the agents evolved in the replications
where  &gt;   +  . Data are collected in a post-evaluation episode. As can be seen,
the agents generally move towards the target regardless of the behavior of their mates. The
pressure to aggregate does not emerge throughout the episode, although some of the robots
may sometimes cross their paths (see Fig. 5, c).</p>
      <p>To summarize, the results reveal the fitness function’s importance on the swarm’s evolved
behavior. Indeed, the agents successfully tackle the evolutionary problem (i.e., they manage to
locomote towards a target destination), though the aggregation is not reached. Carefully
designing the fitness function to address all the objectives is paramount for enhancing a successful
collective strategy.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Discussion and Conclusion</title>
      <p>
        In this work, we proposed a theoretical framework to analyze collective behaviors in MASs by
exploiting modern simulation environments and using state-of-the-art Evolutionary Algorithms.
In particular, we extended the benchmark AntBullet problem to a collective scenario involving a
swarm of five AntBullet robots, whose goal was to aggregate during locomotion. We designed a
multi-objective fitness function rewarding agents for both capabilities. The results we achieved
demonstrate that agents successfully developed a locomotion capability, but they did not
aggregate. Undoubtedly, the definition of a multi-objective fitness function presents several
challenges because, in many cases, the objectives conflict with each other. Essentially, improving
performance on one objective might degrade performance on another. This was specifically the
case where the attempt to optimize the  component might have degraded the  component. This
contrast made it dificult to find an optimal solution that satisfied all objectives simultaneously.
Moreover, in this case, a problem with scaling may have occurred. The diferent components
might have had diferent scales during the trials, especially because, in dynamic environments,
the importance of diferent objectives might change over time. This implied that one objective
might have dominated the others, leading to biased results. This aligns with the recommendation
from the aforementioned study [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], emphasizing the importance of utilizing appropriate
reward functions tailored to each task. Thus, addressing these challenges often requires careful
consideration of the specific characteristics of the problem at hand.
      </p>
      <p>Regarding future research directions, several ideas emerge. Undoubtedly, future work should
focus on refining the multi-objective fitness function to better balance conflicting sub-goals.
Additionally, investigating alternative state-of-the-art evolutionary algorithms could produce
more sophisticated strategies for achieving both locomotion and aggregation. Comparing these
alternatives with OpenAI-ES, as done in some of our works, might provide insights into which
algorithms are more suitable for multi-objective tasks in dynamic environments. Moreover,
introducing communication mechanisms among the agents could potentially enhance the
emergence of aggregation behavior. Understanding how communication afects the swarm’s
ability to aggregate while maintaining efective locomotion could lead to developing more
sophisticated strategies. Lastly, as we discussed in Section 5, using a heterogeneous MAS similar
to [53] might foster the emergence of more complex dynamics, in which the individual skills
may play diferent roles and the resulting overall behavior cannot be predicted a prior. In
conjunction with this, eficient communication requires strong collaboration, and robots need
more intelligence to handle unknown situations. These demands are challenging to meet with
current architectures and hardware. Creating communication networks between heterogeneous
agents and designing eficient communication protocols are key challenges that need further
exploration. Indeed, considering hybrid solutions within the context of Edge Intelligence
platforms (EIP) or the Internet of Things (IoT), combined with evolutionary strategies for agent
populations, could be a promising direction for future research.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgement</title>
      <p>A.V. acknowledges support from the PNRR MUR project PE0000013-FAIR - Future Artificial
Intelligence Research.
[46] S. Song, K. D. Miller, L. F. Abbott, Competitive hebbian learning through
spike-timingdependent synaptic plasticity, Nat Neurosci 3 (2000) 919–926.
[47] E. Ordaz-Rivas, L. Torres-Treviño, Improving performance in swarm robots using
multiobjective optimization, Mathematics and Computers in Simulation 223 (2024) 433–457.
[48] Q. Zhang, H. Li, Moea/d: A multiobjective evolutionary algorithm based on decomposition,</p>
      <p>IEEE Transactions on evolutionary computation 11 (2007) 712–731.
[49] V. Trianni, R. Groß, T. H. Labella, E. Şahin, M. Dorigo, Evolving aggregation behaviors in
a swarm of robots, in: Advances in Artificial Life: 7th European Conference, ECAL 2003,
Dortmund, Germany, September 14-17, 2003. Proceedings 7, Springer, 2003, pp. 865–874.
[50] J. J. Grefenstette, et al., Genetic algorithms for changing environments, in: Ppsn, volume 2,</p>
      <p>Citeseer, 1992, pp. 137–144.
[51] D. Kengyel, H. Hamann, P. Zahadat, G. Radspieler, F. Wotawa, T. Schmickl, Potential of
heterogeneity in collective behaviors: A case study on heterogeneous swarms, in: PRIMA
2015: Principles and Practice of Multi-Agent Systems: 18th International Conference,
Bertinoro, Italy, October 26-30, 2015, Proceedings 13, Springer, 2015, pp. 201–217.
[52] P. Zahadat, T. Schmickl, Wolfpack-inspired evolutionary algorithm and a
reaction-difusionbased controller are used for pattern formation., in: GECCO, 2014, pp. 241–248.
[53] P. Pagliuca, A. Vitanza, N-mates evaluation: a new method to improve the performance
of genetic algorithms in heterogeneous multi-agent systems., Proceedings of the 24th
Edition of the Workshop From Object to Agents (WOA23) 3579 (2023) 123–137.
[54] P. Pagliuca, S. Nolfi, The dynamic of body and brain co-evolution, Adaptive Behavior 30
(2022) 245–255.
[55] J. Rais Martínez, F. Aznar Gregori, Comparison of evolutionary strategies for reinforcement
learning in a swarm aggregation behaviour, in: Proceedings of the 2020 3rd International
Conference on Machine Learning and Machine Intelligence, 2020, pp. 40–45.
[56] P. Pagliuca, D. Inglese, A. Vitanza, Measuring emergent behaviors in a mixed
competitivecooperative environment, International Journal of Computer Information Systems and
Industrial Management Applications 15 (2023) 69–86.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P. G.</given-names>
            <surname>Balaji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Srinivasan</surname>
          </string-name>
          ,
          <article-title>An Introduction to Multi-Agent Systems</article-title>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2010</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Coumans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <article-title>Pybullet, a python module for physics simulation for games, robotics and machine learning (</article-title>
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.-L.</given-names>
            <surname>Deneubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Detrain</surname>
          </string-name>
          ,
          <article-title>Dynamics of aggregation and emergence of cooperation</article-title>
          ,
          <source>The Biological Bulletin</source>
          <volume>202</volume>
          (
          <year>2002</year>
          )
          <fpage>262</fpage>
          -
          <lpage>267</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Garnier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jost</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jeanson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gautrais</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Asadpour</surname>
          </string-name>
          , G. Caprari, G. Theraulaz,
          <article-title>Aggregation behaviour as a source of collective decision in a group of cockroach-like-robots</article-title>
          ,
          <source>in: European conference on artificial life</source>
          , Springer,
          <year>2005</year>
          , pp.
          <fpage>169</fpage>
          -
          <lpage>178</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Jeanson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rivault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-L.</given-names>
            <surname>Deneubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Blanco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fournier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jost</surname>
          </string-name>
          , G. Theraulaz,
          <article-title>Self-organized aggregation in cockroaches</article-title>
          ,
          <source>Animal behaviour 69</source>
          (
          <year>2005</year>
          )
          <fpage>169</fpage>
          -
          <lpage>180</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Oldroyd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smolenski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lawler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Estoup</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Crozier</surname>
          </string-name>
          , Colony aggregations in apis mellifera l,
          <source>Apidologie</source>
          <volume>26</volume>
          (
          <year>1995</year>
          )
          <fpage>119</fpage>
          -
          <lpage>130</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Smith</surname>
          </string-name>
          , Open dynamics engine,
          <year>2008</year>
          . URL: http://www.ode.org/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>V.</given-names>
            <surname>Ondroušek</surname>
          </string-name>
          ,
          <source>The Solution of 3D Indoor Simulation of Mobile Robots Using ODE</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2010</year>
          , pp.
          <fpage>215</fpage>
          -
          <lpage>220</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978- 3-
          <fpage>642</fpage>
          - 05022- 0_
          <fpage>37</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Arena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Patané</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Termini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vitanza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Strauss</surname>
          </string-name>
          ,
          <article-title>Software/hardware issues in modelling insect brain architecture</article-title>
          , in: S. Jeschke,
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          , D. Schilberg (Eds.),
          <source>Intelligent Robotics and Applications</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2011</year>
          , pp.
          <fpage>46</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <article-title>A robotic simulation framework for cognitive systems, Spatial Temporal Patterns for Action-Oriented Perception in Roving Robots II: An Insect Brain Computational Model (</article-title>
          <year>2014</year>
          )
          <fpage>153</fpage>
          -
          <lpage>176</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>NVIDIA</surname>
          </string-name>
          ,
          <string-name>
            <surname>Physx</surname>
            <given-names>library</given-names>
          </string-name>
          ,
          <year>2008</year>
          . URL: https://developer.nvidia.com/physx-sdk.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lächele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Franchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. H.</given-names>
            <surname>Bülthof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Robufo</surname>
          </string-name>
          <string-name>
            <surname>Giordano</surname>
          </string-name>
          ,
          <article-title>Swarmsimx: Real-time simulation environment for multi-robot systems</article-title>
          , in: I.
          <string-name>
            <surname>Noda</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ando</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Brugali</surname>
            ,
            <given-names>J. J.</given-names>
          </string-name>
          <string-name>
            <surname>Kufner</surname>
          </string-name>
          (Eds.), Simulation, Modeling, and Programming for Autonomous Robots, Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2012</year>
          , pp.
          <fpage>375</fpage>
          -
          <lpage>387</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mittal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rudin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hoeller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mazhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mandlekar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Babich</surname>
          </string-name>
          , G. State,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garg</surname>
          </string-name>
          ,
          <article-title>Orbit: A unified simulation framework for interactive robot learning environments</article-title>
          ,
          <source>IEEE Robotics and Automation Letters</source>
          <volume>8</volume>
          (
          <year>2023</year>
          )
          <fpage>3740</fpage>
          -
          <lpage>3747</lpage>
          . doi:
          <volume>10</volume>
          .1109/LRA.
          <year>2023</year>
          .
          <volume>3270034</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>E.</given-names>
            <surname>Coumans</surname>
          </string-name>
          , Bullet physics simulation,
          <source>in: ACM SIGGRAPH 2015 Courses, SIGGRAPH '15</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2015</year>
          . doi:
          <volume>10</volume>
          .1145/2776880. 2792704.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>N.</given-names>
            <surname>Koenig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Howard</surname>
          </string-name>
          ,
          <article-title>Design and use paradigms for Gazebo, an open-source multi-robot simulator</article-title>
          ,
          <source>in: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566)</source>
          , volume
          <volume>3</volume>
          ,
          <year>2004</year>
          , pp.
          <fpage>2149</fpage>
          -
          <lpage>2154</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Quigley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Conley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gerkey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Faust</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Foote</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leibs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wheeler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Ng</surname>
          </string-name>
          , et al.,
          <article-title>Ros: an open-source robot operating system</article-title>
          ,
          <source>in: ICRA workshop on open source software</source>
          , volume
          <volume>3</volume>
          ,
          <string-name>
            <surname>Kobe</surname>
          </string-name>
          , Japan,
          <year>2009</year>
          , p.
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>E.</given-names>
            <surname>Rohmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. P. N.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Freese</surname>
          </string-name>
          ,
          <article-title>Coppeliasim (formerly v-rep): a versatile and scalable robot simulation framework</article-title>
          ,
          <source>in: Proc. of The International Conference on Intelligent Robots and Systems (IROS)</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Webots</surname>
          </string-name>
          , http://www.cyberbotics.com, ???? URL: http://www.cyberbotics.com, open-source
          <source>Mobile Robot Simulation Software.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>C.</given-names>
            <surname>Pinciroli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Trianni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. O</given-names>
            <surname>'Grady</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Pini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Brutschy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brambilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mathews</surname>
          </string-name>
          , E. Ferrante,
          <string-name>
            <given-names>G. Di</given-names>
            <surname>Caro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ducatelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Birattari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Gambardella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dorigo</surname>
          </string-name>
          ,
          <article-title>Argos: a modular, parallel, multi-engine simulator for multi-robot systems</article-title>
          ,
          <source>Swarm intelligence 6</source>
          (
          <year>2012</year>
          )
          <fpage>271</fpage>
          -
          <lpage>295</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>G.</given-names>
            <surname>Massera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ferrauto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Gigliotta</surname>
          </string-name>
          , S. Nolfi,
          <string-name>
            <surname>FARSA:</surname>
          </string-name>
          <article-title>An Open Software Tool for Embodied Cognitive Science</article-title>
          ,
          <source>in: Proceedings of the 12th European Conference on Artificial Life (ECAL</source>
          <year>2013</year>
          ),
          <year>2013</year>
          , pp.
          <fpage>538</fpage>
          -
          <lpage>545</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Paszke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bradbury</surname>
          </string-name>
          , G. Chanan,
          <string-name>
            <given-names>T.</given-names>
            <surname>Killeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gimelshein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Antiga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Desmaison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Köpf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>DeVito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Raison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tejani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chilamkurthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Steiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chintala</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pytorch:</surname>
          </string-name>
          <article-title>An imperative style, high-performance deep learning library</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1912</year>
          .01703.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Meurer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Paprocki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Čertík</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Kirpichev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rocklin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ivanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. K.</given-names>
            <surname>Moore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rathnayake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. E.</given-names>
            <surname>Granger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. P.</given-names>
            <surname>Muller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bonazzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vats</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Johansson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Curry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Terrel</surname>
          </string-name>
          , v. Roučka,
          <string-name>
            <given-names>A.</given-names>
            <surname>Saboo</surname>
          </string-name>
          , I. Fernando,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kulal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cimrman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Scopatz</surname>
          </string-name>
          ,
          <article-title>Sympy: symbolic computing in python</article-title>
          ,
          <source>PeerJ Computer Science</source>
          <volume>3</volume>
          (
          <year>2017</year>
          )
          <article-title>e103</article-title>
          . doi:
          <volume>10</volume>
          .7717/peerj- cs.103.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Cook</surname>
          </string-name>
          ,
          <article-title>CUDA Programming: A Developer's Guide to Parallel Computing with GPUs</article-title>
          , 1st ed., Morgan Kaufmann Publishers Inc., San Francisco, CA, USA,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>R.</given-names>
            <surname>Akrour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tateo</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Peters,</surname>
          </string-name>
          <article-title>Continuous action reinforcement learning from a mixture of interpretable experts</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>44</volume>
          (
          <year>2021</year>
          )
          <fpage>6795</fpage>
          -
          <lpage>6806</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dankwa</surname>
          </string-name>
          , W. Zheng,
          <article-title>Twin-delayed ddpg: A deep reinforcement learning technique to model a continuous movement of an intelligent robot agent</article-title>
          ,
          <source>in: Proceedings of the 3rd international conference on vision, image and signal processing</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>P.</given-names>
            <surname>Pagliuca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Milano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nolfi</surname>
          </string-name>
          ,
          <article-title>Eficacy of modern neuro-evolutionary strategies for continuous control optimization</article-title>
          ,
          <source>Frontiers in Robotics and AI</source>
          <volume>7</volume>
          (
          <year>2020</year>
          )
          <fpage>98</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>D.</given-names>
            <surname>Reda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tao</surname>
          </string-name>
          , M. van de Panne,
          <article-title>Learning to locomote: Understanding how environment design matters for deep reinforcement learning</article-title>
          ,
          <source>in: Proceedings of the 13th ACM</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>