<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Game⋆</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Mari</string-name>
          <email>lmari@liuc.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Large Language Model, LLM, Agent-based Modelling, Sustainability Game, Artificial Intelligence, AI</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for the Study of Existential Risk, University of Cambridge</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Francesco Bertolotti</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Intelligence, Complexity, and Technology Lab (ICT Lab), University Cattaneo</institution>
          ,
          <addr-line>LIUC</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Joz̆ef Stefan Institute</institution>
          ,
          <country country="SI">Slovenia</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>School of Industrial Engineering, University Cattaneo</institution>
          ,
          <addr-line>LIUC</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents an agent-based model (ABM) of a sustainability game in which each agent is powered by a Large Language Model (LLM). The simulation model explores how LLM-based agents manage the tension between short-term competitive advantage and long-term ecological sustainability. By embedding agents in a resource-constrained environment-featuring renewable and non-renewable assets, military conflict, and shared environmental limits-the paper investigates whether and under what conditions LLMs can adopt sustainable behaviors. Several experimental scenarios are evaluated with diferent strategies endowed to agents, also varying the number of agents, the connectivity of the relationship network and forecast length. Results show that LLM agents can more likely achieve sustainable collective outcomes when unguided or when provided with explicitly sustainable strategies. Also, explicit strategies significantly influence system dynamics-occasionally leading to ecological collapse or aggressive domination. Findings suggest that even shallow behavioral priors can steer LLM-based agents toward or away from sustainability, and that tests of this kind may serve as valuable tools for assessing alignment and coordination in multi-agent LLM systems. Moreover, the results provide insight to confirm that LLM-enhanced ABMs could be used in sustainability issues.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In recent years, there has been growing interest in agents, particularly those based on generative
artificial systems [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This attention is not coincidental; rather, it is well justified, as LLMs [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] are
proving to be transformative systems—at least in terms of responsiveness—across a wide range of
domains [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. As these models increasingly influence real-world processes, it becomes essential the
investigation of their behavior in controlled, multi-agent experiments [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Such studies could ofer
valuable insights into how these systems might act in complex and socially relevant scenarios that
could one day become pressing in practical applications [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ].
      </p>
      <p>
        Among these scenarios, the most relevant in the current landscape of international politics are
the dynamics of inter-nation competition and the challenge of sustainability [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In particular,
geopolitical and military rivalry between distinct entities often comes at the expense of natural
resource consumption—what we might broadly refer to as the biosphere—which, to some extent, is
a shared domain among all actors [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], as it often happens in competition scenario [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]. In this
regard, the modeled scenario can be interpreted as a competitive extension of the classic tragedy of the
commons dilemma, where short-term strategic advantage conflicts with the long-term preservation of
⋆You can use this document as the template for preparing your publication. We recommend using the latest version of the
      </p>
      <p>CEUR</p>
      <p>ceur-ws.org</p>
      <p>
        In this paper, we employed a previously developed game designed to elicit the tension
between short-term competition and long-term sustainability [
        <xref ref-type="bibr" rid="ref15 ref8">8, 15</xref>
        ]. In earlier versions of the game,
traditional agents were used—either evolving strategies over time or adjusting their preferences
based on the dynamics of certain system properties. Here, we tested how LLMs compete within the
same framework [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], with a focus on understanding the conditions under which they are capable of
achieving sustainable behavior, and when they fail to do so [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. In this way, it is both a methodological
and domain-specific work [
        <xref ref-type="bibr" rid="ref18">18, 19</xref>
        ].
      </p>
      <p>In particular, we designed an ABM of the sustainability game in which each agent is powered by an
LLM [20, 21]. This setup enabled both the repetition of simulations and the exploration of diferent
experimental scenarios, allowing us to gather insights into the behavior of these systems within the
specified context [ 22]. While such models represent a cutting-edge frontier in simulation research,
they also come with notable limitations—most prominently, their lack of explainability and high
computational cost—which complicate their practical use [23]. A secondary objective of this work was
therefore to reflect on these challenges and contribute to the broader scientific debate surrounding the
role of LLMs in agent-based modeling.</p>
      <p>The findings of this paper are twofold. On one hand, we validated that a sustainability
scenario can be efectively studied using not a generic ABM [ 24], but a model in which each agent is
powered by an LLM. This was supported by three observations: the model operated in a coherent
and reasonable manner; the results aligned with those obtained in prior studies using traditional
agent-based approaches; and the variation in strategies produced outcomes that were intuitive and
consistent with theoretical expectations.</p>
      <p>On the other hand, we identified two key insights regarding the use of LLMs in multi-agent systems.
First, when left unguided—without explicit strategies—LLMs are, under certain conditions, able to find a
balance both with one another and with the environment [24]. This is particularly true when suficient
information is available for making informed decisions and when the network is sparse enough that
short-term competition does not dominate survival dynamics [25]. Second, however, introducing
explicit behavioral guidance into the system prompt—defining not only the rules of the game but also
how agents should play—can dramatically alter this balance, and the sensitivity of the overall system to
their instructions [22]. It can lead, for instance, to populations that engage in relentless conflict until
only one agent remains, or to groups that fail to manage resource depletion efectively, resulting in
system collapse [26, 27].</p>
      <p>The paper is structured as follows. First, the methodology is presented, including a brief overview of
the sustainability game, a detailed description of the agent-based model, and the experimental design
used. Next, the results from the various experiments are reported. This is followed by a discussion of
the findings and the implications they raise. Finally, conclusions are drawn based on the observed
outcomes.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>The methodology is divided into three parts. First, the sustainability game is briefly described. Second,
its implementation in an LLM-enhanced ABM is presented. Finally, the experimental setup is shown,
with everything needed to guarantee replicability of the results.</p>
      <sec id="sec-2-1">
        <title>2.1. Sustainability game description</title>
        <p>The sustainability game presents a stylized competitive environment in which agents (players)
pursue either short-term gains or long-term survival strategies. Each player manages an inventory of
resources represented by four types of colored blocks: green (sustainable industrial capacity), black
(non-sustainable industrial capacity), red (military power), and brown (biosphere). The green and black
blocks stand for the industrial capability of the player [28]. The brown blocks are a shared, finite stock
representing environmental capital and are not owned by any individual.</p>
        <p>Players begin with equal numbers of green, black, and red blocks, while brown blocks are
centrally managed. The game unfolds in discrete turns, during which players may produce new
blocks and choose whether to attack other agents. Production rules determine how existing blocks
can be transformed into others, with green blocks enabling sustainable production and black
blocks enabling more profitable but environmentally damaging pathways. Players may also produce
red blocks (military) from either green or black sources. Figure 1 depicts players production possibilities.
Aggressive actions are optional but strategic: players may attack one neighbor per turn,
potentially seizing their industrial capacity. Combat depletes both players’ red blocks, but if the attacker
has any remaining, they appropriate the defender’s black and green blocks. A player is eliminated if
they lose all industrial capacity.</p>
        <p>The biosphere (brown blocks) is depleted based on the number of non-sustainable (black and
red) blocks held across all players. Green blocks can ofset red blocks, reducing ecological degradation.
If all brown blocks are exhausted, all players lose, indicating ecological collapse. The game ends when
only one player remains, when the final turn is reached with multiple survivors, or when the biosphere
is fully depleted.</p>
        <p>
          Victory can be individual (via domination), collective (survival without collapse), or null (collapse).
The game thus induces a tension between competitive strategies that yield rapid advantage but accelerate
collapse, and cooperative or foresighted strategies that favor mutual long-term survival. Players must
balance exploitation, conflict, and sustainability to navigate the shifting trade-ofs embedded in the
system. A more comprehensive description of the game can be found in previous works [
          <xref ref-type="bibr" rid="ref15 ref8">15, 8</xref>
          ]
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. LLM-enhanced agent-based model</title>
        <p>In this work, we modeled the game as an agent-based model, so employing a computational simulation
framework in which individual entities, known as agents, operate according to predefined behavioral
rules and interact within an environment [29, 30]. Each player is represented as an autonomous
entity equipped with an explicit objective. Each agent interacts with its environment through a set of
actuators that allow it to pursue its goal, and perceives changes in the environment through dedicated
sensors. To clarify the agent-based implementation, we define the four fundamental components of
each agent: environment, sensors, actuators, and internal states [31]. These elements jointly determine
the agent’s behavior and its capacity to adapt to dynamic conditions within the simulated system.
The environment of each agent consists of the state of the biosphere—represented by the number of
brown blocks—and the states of the agents to which it is connected. The initial number of brown blocks
is  0. We introduce the concept of relation because agents are embedded in a relational network: they
can only perceive and interact with those agents to whom they are directly linked. The number of links
 per agent is a tunable parameter of the model.</p>
        <p>Within this environment, an agent can perceive two types of information in each of the  time-step of
the simulation: the block composition of its neighbors (i.e., their industrial and military capacities) and
the global state of the biosphere. No other information is accessible. This stylization is also essential to
ensure that LLMs used by agents to take decisions focus on the most relevant and available information.
The third component is the set of actuators. Each agent has two possible actions: deciding how much
to produce and choosing whether to attack, and whom. The production decision has a direct efect
on the agent’s own blocks, updating them accordingly. The attack decision, instead, afects both the
attacker’s and the target neighbor’s states. These internal states include the number of black, green,
and red blocks, as well as a memory of the past   states of the biosphere.</p>
        <p>Each agent is endowed of a memory of a given length   to make more informed decisions—ones
that are not solely reactive to the present state but also consider recent trends in the environment.
Specifically, from the starting point, agents are able to perform a prediction based on a linear
extrapolation for the following   time steps. However, unlike traditional ABMs where decisions are
made using predefined empirical rules or simple neural networks trained on specific scenarios, our
model delegates decision-making to LLM. Since agents face two distinct decision types—production
and attack—we designed two separate prompts tailored to each action. This separation is necessary
to guide the LLM appropriately, but it also introduces a limitation of the model: to ensure faster
response times, agents do not retain a history of their own past actions. Instead, they rely only
on their current internal states and environmental cues. As a result, agents cannot coordinate
production and attack choices in a fully integrated strategic manner. Nonetheless, both decision
prompts share a common system prompt header, which defines the agent’s general behavior and context.
In the case of production decisions, the prompt explicitly lists and explains the seven possible
actions along with their respective efects on the agent’s internal state. For attack decisions, the prompt
describes the general consequences of an attack, outlining its potential impact without referencing
specific agents. In both cases, the prompts include clear instructions on the expected output format to
ensure consistent and interpretable responses from the LLM.</p>
        <p>The cognitive capacity of agents in this type of model can be modulated in two primary ways. The first
is by selecting a more or less capable LLM, or, if needed, by fine-tuning a specific model to better align
its behavior with the desired decision-making patterns. The second approach involves providing the
model with a richer set of input information, thereby enabling progressively more informed decisions.
However, this strategy faces diminishing returns due to limitations such as attention bias and the finite
number of input tokens that can be processed by an LLM. Furthermore, it is important to acknowledge
a fundamental constraint: LLMs are inherently language-based models and, when used in isolation, are
not equipped to perform sophisticated quantitative predictions based on structured data.
In all cases, the models follow predefined strategies. An initial experiment was conducted in
which strategies were generated directly using LLMs; this exploratory phase is documented and
included in the online repository alongside the full implementation of the model. The repository
also contains the results of this preliminary experiment, providing insight into the capabilities and
limitations of strategy generation via language models.
Summary of key simulation parameters used in the agent-based implementation of the sustainability game.
Each agent interacts within a dynamic networked environment, perceives biosphere state changes, and makes
decisions based on internal memory and local information. The parameters define the scale ( ), memory capacity
(  ), relational embedding ( ), environmental awareness (  ,  0), and overall simulation duration ( ).</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Experimental design</title>
        <p>Several experiments were conducted, all sharing a common component: the systematic exploration
of the model’s parameter space. However, they difer in terms of the strategies employed by the
agents, allowing us to assess how variations in decision-making approaches interact with diferent
environmental and structural conditions.</p>
        <p>The parameter space was investigated with respect to three specific parameters.
The chosen
methodology for this exploration was random grid sampling, which involves generating specific
parameter combinations, each randomly drawn from a defined distribution. Given that the range of
parameter values was relatively narrow—with none exceeding a single order of magnitude—a uniform
probability distribution was applied across the entire parameter space. The three selected parameters
were: the number of players  , the number of links  connecting each agent to others, and the forecast
horizon length   . The first parameter was included to assess how the size of the agent population
influences the system’s ability to remain sustainable, holding the initial biosphere level  0 constant. The
second parameter was varied to explore how network connectivity afects the likelihood and intensity
of conflict between agents. The third parameter was designed to evaluate the impact of increased
information-processing capacity—particularly with regard to sustainability—on agent decision-making.
restricted the available strategies to only green and killer, while in the third, every agent was assigned
description
You produce as many red blocks possible. You ignore prediction about brown
decreasing. You attack multiple agents per time, only if you have much more reds then
them.</p>
        <p>You produce some red blocks for self defense, just few. You never attack. You make
sure you have more green than blacks and green combined. If brown predictions
are low, you convert all your blacks into greens and use the greens only to produce
browns.</p>
        <p>You produce only green blocks and very few red blocks. You attack only if you are
certain to win. You never create black blocks. You use the greens mostly to create as
many browns as possible.</p>
        <p>You first produce as many blacks as you can. Then you produce a lot of reds. You
always want to have more reds than the average. If the prediction for the browns
gets negative, you stop everything else, convert all the blacks into green and use all
the green to produce browns.</p>
        <p>You produce a balanced number of greens, blacks and reds. You use all the green you
have always to create new browns. If the prediction for browns get negative, attack
everyone with all you reds and convert your blacks into greens.</p>
        <p>You do not care about the depletion of green blocks. You just want to produce half
of your capacity in blacks and the other half in reds. You attack always neighbors
weaker than you.</p>
        <p>You always attack whoever has more blacks than you.
exclusively the green strategy. This design is, to some extent, limiting, as a more comprehensive
analysis would ideally examine each individual strategy as well as all possible pairwise combinations.
However, given the exploratory and preliminary nature of this study, the primary objective was to
assess the feasibility and conceptual validity of this modeling approach. Consequently, the experimental
setup was intentionally kept simple and focused.</p>
        <p>As the system’s behavioral model, we exclusively used GPT-4o-mini, in the version available
as of April 20, 2025. This choice was motivated by three main factors. First, the model ofers
significantly faster response times compared to other flagship models released by OpenAI at the time of
the experiment, and its latency is comparable to models from other providers such as Anthropic, Google,
and DeepSeek. Moreover, even considering the use of an open-source model, inference remains a
critical bottleneck, as it would require GPU resources for a nontrivial amount of time. Even with access
to such hardware, inference would likely be slower overall, and executing calls in parallel would have
been considerably more dificult. Second, GPT-4o-mini demonstrated strong performance on standard
intelligence benchmarks, and a preliminary assessment indicated that it appeared to understand the
decisions it was making. This made the model particularly compelling for the purpose of our study.
This assessment involved prompting the LLM to explain its production and attack choices in context,
in order to evaluate whether it was responding randomly or demonstrating some level of situational
awareness. While whether this awareness constitutes true understanding lies beyond the scope of this
work, the results were nonetheless promising. Third, the model’s low operational cost was a decisive
factor. Since it does not require dedicated GPUs and relies solely on API calls, the cost of running
experiments with GPT-4o-mini was suficiently low to make the study afordable within our available resources.
In evaluating the experiment, we set the temperature of the LLM to zero in order to ensure
full replicability and thereby increase the scientific rigor of the results. However, the use of LLMs to
develop agent-based models raises a number of open methodological concerns. The first and most
(a) Mean number of turns survived by agents under each
of the seven strategies.
evident is their black-box nature. Neither the contents of GPT-4o-mini’s training set nor the retention
level of any specific information are publicly known. Since the game modeled in this study can be
viewed as a competitive variant of the commons dilemma—a scenario known to GPT-4o-mini—it is
impossible to determine whether prior exposure to such problems influenced the model’s responses, or,
more critically, what the nature of that influence may have been, if any.</p>
        <p>The second major concern is replicability. As long as one relies on a proprietary, closed-source model,
scientific replication is only possible for as long as that specific model version remains accessible
via API. This does not invalidate the scientific value of such work—science often progresses through
iterative experimentation and occasional error—but it does highlight a fundamental limitation. We
believe it is essential to clearly acknowledge this limitation, as it touches on the core of what makes a
result verifiable and robust.</p>
        <p>The model implementation, experimental procedures, and data analysis were all conducted
using Python 3.11.3. The full model code used to generate the results’ experiments is available at the
following https://anonymous.4open.science/r/LLM_sustainability_game-1C88/.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>Three distinct scenarios can be identified in all the experiments:  1 corresponds to the case in which
all agents go extinct before reaching the end of the simulation;  2 describes the situation where a
single agent is able to militarily defeat all others and emerge as the sole survivor; and  3 represents the
outcome in which multiple agents successfully survive until the final time step.</p>
      <sec id="sec-3-1">
        <title>3.1. Multiple strategies</title>
        <p>median simulation length, suggesting faster collapse or convergence. The variability is higher at lower
player counts, with a few outlier runs reaching very high survival times. As the number of players
increases, the outcome becomes more tightly clustered around earlier termination points. This suggests
that larger populations may accelerate competitive dynamics, leading to quicker system destabilization.
Also, a greater population using a fixed amount of resources can consume them quicker.</p>
        <p>Figure 3 illustrates both individual and aggregate agent behavior over time for a simulation that
concludes with lifestock exhaustion. Panel 3a shows trajectories in resource accumulation across
agents, highlighting divergence in success despite shared initial conditions. These diferences could be
accounted both individuals’ and neighbors’ strategies. In panel 3b, the aggregate dynamics reveal a
depletion of brown resources which is been mitigated towards the end, given that the agents adjust
their action according to the prediction regarding the brown blocks. This can be seen also with the
dominance of green blocks in the second part of the simulation, suggesting a systemic shift toward
sustainable production. The number of agents decreases slightly over time, as the results of competition.</p>
        <p>Figure 4 illustrates the system’s behavior under conditions leading to long-term sustainability (which
is the  3 scenario). As shown in panel 4a, the individual agents rapidly accumulate green resources while
black and red resources diminish early. In panel 4b, which depics the aggregate value of the blocks in
the system, it is possible to observe that the total number of brown blocks increases over time, indicating
a regenerative dynamic possibly due to restrained exploitation. This was possible also because the
number of players stabilizes at two, suggesting early extinction of others – especially the one with
more aggressive strategies – could lead to long-term coexistence between survivors. Decision patterns
confirm a dominant reliance on green-to-green actions, aligning with the observed environmental
recovery.</p>
        <p>Table ?? reports summary statistics for the three outcome scenarios  1,  2, and  3 in the multiple
strategies experiment. Scenario  3, associated with agent coexistence, features the lowest average
number of links  and the longest duration   , suggesting less dense networks may promote sustainability.
In contrast,  1 and  2—associated with extinction and domination—occur at higher link densities. The
standard deviation in  indicates greater variability in population outcomes under  1 and  2. These
ifndings point to a potential trade-of between connectivity and system stability.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Two strategies</title>
        <p>Figure 5b reveals a pattern consistent with that of Figure 2b, indicating that the average simulation
length remains largely unafected by the reduction in available strategies from seven to two—at least for
the specific strategies examined. In contrast, Figure 5a shows a marked change in individual outcomes:
the average survival time for agents using the killer strategy remains stable, while that of the green
strategy increases significantly. This improvement can be attributed to the higher likelihood of green
agents encountering similarly non-aggressive neighbors, reducing the risk of early elimination.</p>
        <p>Figure 6 illustrates a dynamic where both green and red strategies gain traction early, but only green
sustains long-term growth in resource accumulation. Panel 6b shows that green blocks eventually
dominate, while red and black decline or fluctuate. The number of players rapidly decreases, with only
one surviving by turn ten, suggesting an unstable competitive environment that arrives to a trivial
equilibrium. Brown resources steadily deplete, indicating over-exploitation, which does not lead to
collapse only because a single player were able to win rapidly enough to avoid it.</p>
        <p>(a) Individual agents’ black, green, and red block counts over time in an extinction ( 1) run.
(b) Total number of blocks (of each color), the surviving agent count, and decision frequencies during  1.</p>
        <p>(a) Individual agents’ black, green, and red block counts over time in an extinction ( 3) run.
(b) Total number of blocks (of each color), the surviving agent count, and decision frequencies during  3.
(a) Average survival time for “killer” vs. “green”
strategies in the two‐strategy experiment.</p>
        <p>Comparing the two tables, we observe that the average number of players  is significantly higher in
the two-strategy experiment than in the multiple-strategy one, particularly in scenario  3, suggesting
that limiting strategic diversity may foster greater agent survival. Conversely, the average number
of links  is substantially lower in the two-strategy setup across all scenarios, indicating that sparser
networks are associated with extended coexistence. The final simulation turn   is relatively consistent
across experiments, with slightly higher values in  3 for both settings. Overall, the reduction in strategic
complexity appears to simplify interactions and promote more stable outcomes.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Single strategy</title>
        <p>In this case, where all agents adopt the green strategy, only scenario  3 emerged across all 50 simulations.
This outcome is far from trivial. Although agents can still attack one another under the green strategy,
the dynamics suggest that the level of aggression remains insuficient to lead to complete elimination.
Moreover, while agents do generate red blocks, the overall production does not appear to be high
enough to exhaust the brown resources. This is supported by the distribution of brown blocks at the
end of the simulation shown in Figure 7b, where no run results in zero remaining brown blocks, and
several simulations end with a very high number of them.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. No strategy</title>
        <p>The final case is not a formal experiment in the same sense as the others, as it was not pre-designed
within the same framework. However, we chose to include it after observing the influence of strategy
on agent survival. Here, agents are assigned no predefined strategy at all, allowing us to observe the
system’s behavior in the absence of structured decision-making. This approach not only complements
the previous results—demonstrating conditions under which populations thrive with diferent
strategies—but also serves as a preliminary step toward a kind of psychomatics: an exploration of the vital
behavioral principles emerging from machines capable of cognitively complex tasks.</p>
        <p>In this case, scenario  3 occurred in 38.78% of simulations, while the remaining runs resulted in
scenario  1; notably, scenario  2 never emerged. That is, in the absence of predefined strategies, there
was no instance in which a single agent succeeded in—or even attempted to—eliminate all others. This
result is particularly intriguing: it suggests that when left unguided, agents driven by GPT-4o-mini may
spontaneously find a balance both with each other and with the environment, at least with a greater
probability. Of course, in the majority of cases, such balance does not arise. Table 4 ofers a possible
(a) Individual agents’ black, green, and red block counts over time in an extinction ( 2) run.
(b) Total number of blocks (of each color), the surviving agent count, and decision frequencies during  2.
(a) Mean (±SD) of population size ( ), network
connectivity (ℓ), and duration (  ) for each outcome in the
two-strategy experiment.</p>
        <p>(b) Histogram of remaining brown blocks at
simulation end, showing that no run fully depleted the
biosphere under the green-only strategy.
explanation, consistent with earlier findings: surviving agent populations tend to have higher foresight
capabilities and lower connectivity. This implies they are better at managing brown resources and less
likely to engage in destructive interactions.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion and conclusion</title>
      <p>
        This paper revisits a sustainability game from the existing literature [
        <xref ref-type="bibr" rid="ref15 ref8">15, 8</xref>
        ], in which a long-term
commons dilemma is set against short-term competitive dynamics among players. The
original agent-based implementation is extended by replacing traditional agents with ones driven
by LLMs. This shift serves a dual purpose: first, to gain deeper insight into how LLM-based
agents behave when faced with scarce renewable and non-renewable resources under competitive
pressure; and second, to explore the potential of using LLMs as the cognitive core of agents within ABMs.
The second objective was easily achieved: the results obtained in this study are consistent—though
not identical—with those reported in previous work modeling the same game. In particular, we
observed that, as in the earlier study, the agents’ ability to process more information significantly
increases their likelihood of survival. Additionally, the structure of the interaction network plays
a critical role in determining which agents persist over time, shaping the mix of strategies that
remains in the system. Finally, the ratio between available resources and the number of players also
influences the system’s long-term sustainability, confirming its relevance as a key factor in the dynamics.
Regarding the first objective, we found that under mixed-strategy conditions, there are only
limited instances in which agents are able to collectively play and win a sustainability game that
involves a tension between short-term competition and long-term resource preservation. However, the
results also show that when agents are guided by an explicitly encoded strategy—defined at the level of
the system prompt, and thus not as deeply embedded as through dedicated training—they can behave
in a non-aggressive manner and avoid destroying the environment they inhabit. This suggests that
even shallow behavioral priors, if well designed, can be suficient to steer agent populations toward
more sustainable outcomes.
      </p>
      <p>In this sense, we can conclude that the ability to manage a commons dilemma in a
competitive setting does not depend solely on the intrinsic behavior of the LLM, but also on how it is explicitly
instructed. However, the analysis of agent behavior without explicit strategies revealed a noteworthy
result: even in the absence of predefined guidance, LLM-based agents were often capable of reaching
the end of the simulation and adopting sustainable strategies. This outcome is far from trivial and
raises important questions. Does it indicate that these models exhibit inherently sustainable behavior
in dynamic environments? Or is it simply a consequence of their prior exposure to similar scenarios
during training—for example, through scientific articles on sustainability games?
While we partially addressed this by explicitly asking the model whether it recognized the game, the
question remains open. What we can confidently take away from this study are two key insights.
First, that an LLM, when left unguided, can spontaneously coordinate with other LLMs to achieve a
sustainable collective behavior. Second, that this emergent sustainability is fragile: it holds only in the
absence of explicit instructions, and can be overridden by a suficiently influential system prompt. This
suggests that tests of this kind could serve as valuable tools for assessing the strategic alignment of
LLMs in multi-agent contexts.</p>
      <p>Finally, there is one additional aspect worth discussing. From the perspective of dynamic
system modeling, one non-trivial challenge is assessing whether an LLM can act consistently within a
dynamic environment—one that evolves over time and whose state depends on its own history. In this
study, we observed that LLM-based agents, even without an extensive set of explicit instructions, were
able to operate reasonably well in such contexts, at least in relatively simple scenarios. This finding
is encouraging, as it suggests that LLMs may possess a degree of temporal coherence suficient for
interacting with non-static environments. Nonetheless, further research is required to determine the
level of system complexity and dynamism under which an LLM can still behave efectively and reliably.
A possible extension of this work would be to assess whether the results hold when using
other LLMs of comparable intelligence, thereby validating that the findings are not specific to OpenAI’s
models [32]. Additionally, it would be valuable to investigate the minimum intelligence threshold
below which an LLM can no longer efectively participate in the game—failing not only to pursue
meaningful strategies but also to produce valid outputs. Further experimentation could also focus on
the second cognitive dimension: the quantity and quality of information provided to the model. We
identify a relationship between the collapse probability and the amount of information provided, but a
more detailed analysis of how access to diferent types of input afects behavior could be performed
[33]. Finally, the impact of specific strategies warrants deeper investigation, including the identification
of potentially optimal strategy combinations and the emergence of collective phenomena from their
interactions [34].</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This publication is supported by the European Union’s Horizon Europe research and innovation
programme under the Marie Skłodowska-Curie Postdoctoral Fellowship Programme, SMASH co-funded
under the grant agreement No. 101081355. The operation (SMASH project) is co-funded by the Republic
of Slovenia and the European Union from the European Regional Development Fund.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The authors have employed Generative AI tools to support code writing, refine the language, and
proofread the final version of the text.
[19] S. Roman, F. Bertolotti, A master equation for power laws, Royal Society open science 9 (2022)
220531.
[20] Y.-S. Chuang, A. Goyal, N. Harlalka, S. Suresh, R. Hawkins, S. Yang, D. Shah, J. Hu, T. T. Rogers,
Simulating opinion dynamics with networks of llm-based agents, arXiv preprint arXiv:2311.09618
(2023).
[21] Ö. Gürcan, Llm-augmented agent-based modelling for social simulations: Challenges and
opportunities, HHAI 2024: Hybrid Human AI Systems for the Social Good (2024) 134–144.
[22] F. Bertolotti, A. Locoro, L. Mari, Sensitivity to initial conditions in agent-based models, in:
MultiAgent Systems and Agreement Technologies: 17th European Conference, EUMAS 2020, and 7th
International Conference, AT 2020, Thessaloniki, Greece, September 14-15, 2020, Revised Selected
Papers 17, Springer, 2020, pp. 501–508.
[23] R. Occa, F. Bertolotti, et al., Understanding the efect of iot adoption on the behavior of firms:
An agent-based model, in: CS &amp; IT Conference Proceedings, volume 12, CS &amp; IT Conference
Proceedings, 2022.
[24] S. Roman, F. Bertolotti, Global history, the emergence of chaos and inducing sustainability in
networks of socio-ecological systems, Plos one 18 (2023) e0293391.
[25] F. Bertolotti, F. Schettini, L. Ferrario, D. Bellavia, E. Foglia, A prediction framework for
pharmaceutical drug consumption using short time-series, Expert systems with applications 253 (2024)
124265.
[26] S. Roman, Historical dynamics of the chinese dynasties, Heliyon 7 (2021).
[27] S. Roman, Theories and models: Understanding and predicting societal collapse, in: The Era of
Global Risk: An Introduction to Existential Risk Studies, Open Book Publishers, 2023, pp. 27–54.</p>
      <p>URL: https://doi.org/10.11647/OBP.0336.02.
[28] N. Saporiti, V. Cannas, G. Pirovano, R. Pozzi, T. Rossi, Barriers and enablers to the implementation
of digital twins in manufacturing companies: A literature review (2020), Proceedings of the
Summer School Francesco Turco (2020).
[29] E. Bonabeau, Agent-based modeling: Methods and techniques for simulating human systems,</p>
      <p>Proceedings of the national academy of sciences 99 (2002) 7280–7287.
[30] F. Bertolotti, N. Kadera, L. Pasquino, L. Mari, An epidemiological extension of the el farol bar
problem, Frontiers in Big Data 8 (2025) 1519369.
[31] C. M. Macal, M. J. North, Agent-based modeling and simulation, in: Proceedings of the 2009
winter simulation conference (WSC), IEEE, 2009, pp. 86–98.
[32] F. Bertolotti, L. Mari, An llm-based delphi study to predict genai evolution, arXiv preprint
arXiv:2502.21092 (2025).
[33] F. Bertolotti, F. Schettini, F. Asperti, E. Foglia, A gravity model for emergency departments,</p>
      <p>Scientific reports 15 (2025) 19537.
[34] F. Bertolotti, R. Occa, “roads? where we’re going we don’t need roads.” using agent-based
modeling to analyze the economic impact of hyperloop introduction on a supply chain, in:
European Conference on Multi-Agent Systems, Springer, 2020, pp. 493–500.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization</article-title>
          ,
          <source>arXiv preprint arXiv:2310.02170</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tenenholtz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Alvarez-Melis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Fusi</surname>
          </string-name>
          ,
          <article-title>Tag-llm: Repurposing general-purpose llms for specialized domains</article-title>
          ,
          <source>arXiv preprint arXiv:2402.05140</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Piao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , et al.,
          <article-title>Agentsociety: Large-scale simulation of llm-driven generative agents advances understanding of human behaviors and society</article-title>
          ,
          <source>arXiv preprint arXiv:2502.08691</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          <article-title>Zhang, Multi-llm-agent systems: Techniques and business perspectives</article-title>
          ,
          <source>arXiv preprint arXiv:2411.14033</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Sreedhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cai</surname>
          </string-name>
          , J. Ma,
          <string-name>
            <given-names>J. V.</given-names>
            <surname>Nickerson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B.</given-names>
            <surname>Chilton</surname>
          </string-name>
          ,
          <article-title>Simulating cooperative prosocial behavior with multi-agent llms: Evidence and mechanisms for ai agents to inform policy decisions</article-title>
          ,
          <source>in: Proceedings of the 30th International Conference on Intelligent User Interfaces</source>
          ,
          <year>2025</year>
          , pp.
          <fpage>1272</fpage>
          -
          <lpage>1286</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>O.</given-names>
            <surname>Osaulenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Yatsenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Reznikova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rusak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Nitsenko</surname>
          </string-name>
          , et al.,
          <article-title>The productive capacity of countries through the prism of sustainable development goals: Challenges to international economic security and to competitiveness, Financial and credit activity problems of theory and practice 2 (</article-title>
          <year>2020</year>
          )
          <fpage>492</fpage>
          -
          <lpage>499</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bertolotti</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <article-title>Roman, Balancing long-term and short-term strategies in a sustainability game</article-title>
          ,
          <source>Iscience</source>
          <volume>27</volume>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bertolotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roman</surname>
          </string-name>
          ,
          <article-title>Risk sensitivity of production studios on the us movie market: an agentbased simulation</article-title>
          .,
          <source>in: WOA</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>210</fpage>
          -
          <lpage>223</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bertolotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roman</surname>
          </string-name>
          ,
          <article-title>Risk sensitive scheduling strategies of production studios on the us movie market: An agent-based simulation</article-title>
          ,
          <source>Intelligenza Artificiale</source>
          <volume>16</volume>
          (
          <year>2022</year>
          )
          <fpage>81</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>G. Hardin,</surname>
          </string-name>
          <article-title>The tragedy of the commons</article-title>
          ,
          <source>Science</source>
          <volume>162</volume>
          (
          <year>1968</year>
          )
          <fpage>1243</fpage>
          -
          <lpage>1248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C. W.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. R.</given-names>
            <surname>Munro</surname>
          </string-name>
          ,
          <article-title>The economics of fishing and modern capital theory: a simplified approach</article-title>
          ,
          <source>Journal of environmental economics and management 2</source>
          (
          <year>1975</year>
          )
          <fpage>92</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Deadman</surname>
          </string-name>
          ,
          <article-title>Modelling individual behaviour and group performance in an intelligent agentbased simulation of the tragedy of the commons</article-title>
          ,
          <source>Journal of Environmental Management</source>
          <volume>56</volume>
          (
          <year>1999</year>
          )
          <fpage>159</fpage>
          -
          <lpage>172</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Chawla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Piva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. W. C.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <article-title>Individual decision-making underlying the tragedy of the commons</article-title>
          ,
          <source>bioRxiv</source>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1101/
          <year>2022</year>
          .11.29.518377.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bertolotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roman</surname>
          </string-name>
          ,
          <article-title>The evolution of risk sensitivity in a sustainability game: an agent-based model</article-title>
          .,
          <source>in: WOA</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>101</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G.</given-names>
            <surname>Piatti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kleiman-Weiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schölkopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sachan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          ,
          <article-title>Cooperate or collapse: Emergence of sustainability behaviors in a society of llm agents</article-title>
          ,
          <source>arXiv preprint arXiv:2404.16698</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>R. M. Turner</surname>
          </string-name>
          ,
          <article-title>The tragedy of the commons and distributed AI systems</article-title>
          , Department of Computer Science, University of New Hampshire Durham,
          <string-name>
            <surname>NH</surname>
          </string-name>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>N.</given-names>
            <surname>Saporiti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Strozzi</surname>
          </string-name>
          , T. Rossi,
          <article-title>Digital twin relationship with virtual reality and augmented reality: a bibliometric review</article-title>
          ,
          <source>Proceedings of the Summer School Francesco Turco</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>