1. Introduction

1613-0073

Game⋆

Luca Mari

lmari@liuc.it 1 2 4

Workshop

Large Language Model, LLM, Agent-based Modelling, Sustainability Game, Artificial Intelligence, AI

0 Center for the Study of Existential Risk, University of Cambridge , UK 1 Francesco Bertolotti 2 Intelligence, Complexity, and Technology Lab (ICT Lab), University Cattaneo , LIUC , Italy 3 Joz̆ef Stefan Institute , Slovenia 4 School of Industrial Engineering, University Cattaneo , LIUC , Italy

This paper presents an agent-based model (ABM) of a sustainability game in which each agent is powered by a Large Language Model (LLM). The simulation model explores how LLM-based agents manage the tension between short-term competitive advantage and long-term ecological sustainability. By embedding agents in a resource-constrained environment-featuring renewable and non-renewable assets, military conflict, and shared environmental limits-the paper investigates whether and under what conditions LLMs can adopt sustainable behaviors. Several experimental scenarios are evaluated with diferent strategies endowed to agents, also varying the number of agents, the connectivity of the relationship network and forecast length. Results show that LLM agents can more likely achieve sustainable collective outcomes when unguided or when provided with explicitly sustainable strategies. Also, explicit strategies significantly influence system dynamics-occasionally leading to ecological collapse or aggressive domination. Findings suggest that even shallow behavioral priors can steer LLM-based agents toward or away from sustainability, and that tests of this kind may serve as valuable tools for assessing alignment and coordination in multi-agent LLM systems. Moreover, the results provide insight to confirm that LLM-enhanced ABMs could be used in sustainability issues.

1. Introduction

In recent years, there has been growing interest in agents, particularly those based on generative artificial systems [ 1 ]. This attention is not coincidental; rather, it is well justified, as LLMs [ 2 ] are proving to be transformative systems—at least in terms of responsiveness—across a wide range of domains [ 3 ]. As these models increasingly influence real-world processes, it becomes essential the investigation of their behavior in controlled, multi-agent experiments [ 4 ]. Such studies could ofer valuable insights into how these systems might act in complex and socially relevant scenarios that could one day become pressing in practical applications [ 5, 6 ].

Among these scenarios, the most relevant in the current landscape of international politics are the dynamics of inter-nation competition and the challenge of sustainability [ 7 ]. In particular, geopolitical and military rivalry between distinct entities often comes at the expense of natural resource consumption—what we might broadly refer to as the biosphere—which, to some extent, is a shared domain among all actors [ 8 ], as it often happens in competition scenario [ 9, 10 ]. In this regard, the modeled scenario can be interpreted as a competitive extension of the classic tragedy of the commons dilemma, where short-term strategic advantage conflicts with the long-term preservation of ⋆You can use this document as the template for preparing your publication. We recommend using the latest version of the

CEUR

ceur-ws.org

In this paper, we employed a previously developed game designed to elicit the tension between short-term competition and long-term sustainability [ 8, 15 ]. In earlier versions of the game, traditional agents were used—either evolving strategies over time or adjusting their preferences based on the dynamics of certain system properties. Here, we tested how LLMs compete within the same framework [ 16 ], with a focus on understanding the conditions under which they are capable of achieving sustainable behavior, and when they fail to do so [ 17 ]. In this way, it is both a methodological and domain-specific work [ 18, 19 ].

In particular, we designed an ABM of the sustainability game in which each agent is powered by an LLM [20, 21]. This setup enabled both the repetition of simulations and the exploration of diferent experimental scenarios, allowing us to gather insights into the behavior of these systems within the specified context [ 22]. While such models represent a cutting-edge frontier in simulation research, they also come with notable limitations—most prominently, their lack of explainability and high computational cost—which complicate their practical use [23]. A secondary objective of this work was therefore to reflect on these challenges and contribute to the broader scientific debate surrounding the role of LLMs in agent-based modeling.

The findings of this paper are twofold. On one hand, we validated that a sustainability scenario can be efectively studied using not a generic ABM [ 24], but a model in which each agent is powered by an LLM. This was supported by three observations: the model operated in a coherent and reasonable manner; the results aligned with those obtained in prior studies using traditional agent-based approaches; and the variation in strategies produced outcomes that were intuitive and consistent with theoretical expectations.

On the other hand, we identified two key insights regarding the use of LLMs in multi-agent systems. First, when left unguided—without explicit strategies—LLMs are, under certain conditions, able to find a balance both with one another and with the environment [24]. This is particularly true when suficient information is available for making informed decisions and when the network is sparse enough that short-term competition does not dominate survival dynamics [25]. Second, however, introducing explicit behavioral guidance into the system prompt—defining not only the rules of the game but also how agents should play—can dramatically alter this balance, and the sensitivity of the overall system to their instructions [22]. It can lead, for instance, to populations that engage in relentless conflict until only one agent remains, or to groups that fail to manage resource depletion efectively, resulting in system collapse [26, 27].

The paper is structured as follows. First, the methodology is presented, including a brief overview of the sustainability game, a detailed description of the agent-based model, and the experimental design used. Next, the results from the various experiments are reported. This is followed by a discussion of the findings and the implications they raise. Finally, conclusions are drawn based on the observed outcomes.

2. Methodology

The methodology is divided into three parts. First, the sustainability game is briefly described. Second, its implementation in an LLM-enhanced ABM is presented. Finally, the experimental setup is shown, with everything needed to guarantee replicability of the results.

2.1. Sustainability game description

The sustainability game presents a stylized competitive environment in which agents (players) pursue either short-term gains or long-term survival strategies. Each player manages an inventory of resources represented by four types of colored blocks: green (sustainable industrial capacity), black (non-sustainable industrial capacity), red (military power), and brown (biosphere). The green and black blocks stand for the industrial capability of the player [28]. The brown blocks are a shared, finite stock representing environmental capital and are not owned by any individual.

Players begin with equal numbers of green, black, and red blocks, while brown blocks are centrally managed. The game unfolds in discrete turns, during which players may produce new blocks and choose whether to attack other agents. Production rules determine how existing blocks can be transformed into others, with green blocks enabling sustainable production and black blocks enabling more profitable but environmentally damaging pathways. Players may also produce red blocks (military) from either green or black sources. Figure 1 depicts players production possibilities. Aggressive actions are optional but strategic: players may attack one neighbor per turn, potentially seizing their industrial capacity. Combat depletes both players’ red blocks, but if the attacker has any remaining, they appropriate the defender’s black and green blocks. A player is eliminated if they lose all industrial capacity.

The biosphere (brown blocks) is depleted based on the number of non-sustainable (black and red) blocks held across all players. Green blocks can ofset red blocks, reducing ecological degradation. If all brown blocks are exhausted, all players lose, indicating ecological collapse. The game ends when only one player remains, when the final turn is reached with multiple survivors, or when the biosphere is fully depleted.

Victory can be individual (via domination), collective (survival without collapse), or null (collapse). The game thus induces a tension between competitive strategies that yield rapid advantage but accelerate collapse, and cooperative or foresighted strategies that favor mutual long-term survival. Players must balance exploitation, conflict, and sustainability to navigate the shifting trade-ofs embedded in the system. A more comprehensive description of the game can be found in previous works [ 15, 8 ]

2.2. LLM-enhanced agent-based model

In this work, we modeled the game as an agent-based model, so employing a computational simulation framework in which individual entities, known as agents, operate according to predefined behavioral rules and interact within an environment [29, 30]. Each player is represented as an autonomous entity equipped with an explicit objective. Each agent interacts with its environment through a set of actuators that allow it to pursue its goal, and perceives changes in the environment through dedicated sensors. To clarify the agent-based implementation, we define the four fundamental components of each agent: environment, sensors, actuators, and internal states [31]. These elements jointly determine the agent’s behavior and its capacity to adapt to dynamic conditions within the simulated system. The environment of each agent consists of the state of the biosphere—represented by the number of brown blocks—and the states of the agents to which it is connected. The initial number of brown blocks is 0. We introduce the concept of relation because agents are embedded in a relational network: they can only perceive and interact with those agents to whom they are directly linked. The number of links per agent is a tunable parameter of the model.

Within this environment, an agent can perceive two types of information in each of the time-step of the simulation: the block composition of its neighbors (i.e., their industrial and military capacities) and the global state of the biosphere. No other information is accessible. This stylization is also essential to ensure that LLMs used by agents to take decisions focus on the most relevant and available information. The third component is the set of actuators. Each agent has two possible actions: deciding how much to produce and choosing whether to attack, and whom. The production decision has a direct efect on the agent’s own blocks, updating them accordingly. The attack decision, instead, afects both the attacker’s and the target neighbor’s states. These internal states include the number of black, green, and red blocks, as well as a memory of the past states of the biosphere.

Each agent is endowed of a memory of a given length to make more informed decisions—ones that are not solely reactive to the present state but also consider recent trends in the environment. Specifically, from the starting point, agents are able to perform a prediction based on a linear extrapolation for the following time steps. However, unlike traditional ABMs where decisions are made using predefined empirical rules or simple neural networks trained on specific scenarios, our model delegates decision-making to LLM. Since agents face two distinct decision types—production and attack—we designed two separate prompts tailored to each action. This separation is necessary to guide the LLM appropriately, but it also introduces a limitation of the model: to ensure faster response times, agents do not retain a history of their own past actions. Instead, they rely only on their current internal states and environmental cues. As a result, agents cannot coordinate production and attack choices in a fully integrated strategic manner. Nonetheless, both decision prompts share a common system prompt header, which defines the agent’s general behavior and context. In the case of production decisions, the prompt explicitly lists and explains the seven possible actions along with their respective efects on the agent’s internal state. For attack decisions, the prompt describes the general consequences of an attack, outlining its potential impact without referencing specific agents. In both cases, the prompts include clear instructions on the expected output format to ensure consistent and interpretable responses from the LLM.

The cognitive capacity of agents in this type of model can be modulated in two primary ways. The first is by selecting a more or less capable LLM, or, if needed, by fine-tuning a specific model to better align its behavior with the desired decision-making patterns. The second approach involves providing the model with a richer set of input information, thereby enabling progressively more informed decisions. However, this strategy faces diminishing returns due to limitations such as attention bias and the finite number of input tokens that can be processed by an LLM. Furthermore, it is important to acknowledge a fundamental constraint: LLMs are inherently language-based models and, when used in isolation, are not equipped to perform sophisticated quantitative predictions based on structured data. In all cases, the models follow predefined strategies. An initial experiment was conducted in which strategies were generated directly using LLMs; this exploratory phase is documented and included in the online repository alongside the full implementation of the model. The repository also contains the results of this preliminary experiment, providing insight into the capabilities and limitations of strategy generation via language models. Summary of key simulation parameters used in the agent-based implementation of the sustainability game. Each agent interacts within a dynamic networked environment, perceives biosphere state changes, and makes decisions based on internal memory and local information. The parameters define the scale ( ), memory capacity ( ), relational embedding ( ), environmental awareness ( , 0), and overall simulation duration ( ).

2.3. Experimental design

Several experiments were conducted, all sharing a common component: the systematic exploration of the model’s parameter space. However, they difer in terms of the strategies employed by the agents, allowing us to assess how variations in decision-making approaches interact with diferent environmental and structural conditions.

The parameter space was investigated with respect to three specific parameters. The chosen methodology for this exploration was random grid sampling, which involves generating specific parameter combinations, each randomly drawn from a defined distribution. Given that the range of parameter values was relatively narrow—with none exceeding a single order of magnitude—a uniform probability distribution was applied across the entire parameter space. The three selected parameters were: the number of players , the number of links connecting each agent to others, and the forecast horizon length . The first parameter was included to assess how the size of the agent population influences the system’s ability to remain sustainable, holding the initial biosphere level 0 constant. The second parameter was varied to explore how network connectivity afects the likelihood and intensity of conflict between agents. The third parameter was designed to evaluate the impact of increased information-processing capacity—particularly with regard to sustainability—on agent decision-making. restricted the available strategies to only green and killer, while in the third, every agent was assigned description You produce as many red blocks possible. You ignore prediction about brown decreasing. You attack multiple agents per time, only if you have much more reds then them.

You produce some red blocks for self defense, just few. You never attack. You make sure you have more green than blacks and green combined. If brown predictions are low, you convert all your blacks into greens and use the greens only to produce browns.

You produce only green blocks and very few red blocks. You attack only if you are certain to win. You never create black blocks. You use the greens mostly to create as many browns as possible.

You first produce as many blacks as you can. Then you produce a lot of reds. You always want to have more reds than the average. If the prediction for the browns gets negative, you stop everything else, convert all the blacks into green and use all the green to produce browns.

You produce a balanced number of greens, blacks and reds. You use all the green you have always to create new browns. If the prediction for browns get negative, attack everyone with all you reds and convert your blacks into greens.

You do not care about the depletion of green blocks. You just want to produce half of your capacity in blacks and the other half in reds. You attack always neighbors weaker than you.

You always attack whoever has more blacks than you. exclusively the green strategy. This design is, to some extent, limiting, as a more comprehensive analysis would ideally examine each individual strategy as well as all possible pairwise combinations. However, given the exploratory and preliminary nature of this study, the primary objective was to assess the feasibility and conceptual validity of this modeling approach. Consequently, the experimental setup was intentionally kept simple and focused.

As the system’s behavioral model, we exclusively used GPT-4o-mini, in the version available as of April 20, 2025. This choice was motivated by three main factors. First, the model ofers significantly faster response times compared to other flagship models released by OpenAI at the time of the experiment, and its latency is comparable to models from other providers such as Anthropic, Google, and DeepSeek. Moreover, even considering the use of an open-source model, inference remains a critical bottleneck, as it would require GPU resources for a nontrivial amount of time. Even with access to such hardware, inference would likely be slower overall, and executing calls in parallel would have been considerably more dificult. Second, GPT-4o-mini demonstrated strong performance on standard intelligence benchmarks, and a preliminary assessment indicated that it appeared to understand the decisions it was making. This made the model particularly compelling for the purpose of our study. This assessment involved prompting the LLM to explain its production and attack choices in context, in order to evaluate whether it was responding randomly or demonstrating some level of situational awareness. While whether this awareness constitutes true understanding lies beyond the scope of this work, the results were nonetheless promising. Third, the model’s low operational cost was a decisive factor. Since it does not require dedicated GPUs and relies solely on API calls, the cost of running experiments with GPT-4o-mini was suficiently low to make the study afordable within our available resources. In evaluating the experiment, we set the temperature of the LLM to zero in order to ensure full replicability and thereby increase the scientific rigor of the results. However, the use of LLMs to develop agent-based models raises a number of open methodological concerns. The first and most (a) Mean number of turns survived by agents under each of the seven strategies. evident is their black-box nature. Neither the contents of GPT-4o-mini’s training set nor the retention level of any specific information are publicly known. Since the game modeled in this study can be viewed as a competitive variant of the commons dilemma—a scenario known to GPT-4o-mini—it is impossible to determine whether prior exposure to such problems influenced the model’s responses, or, more critically, what the nature of that influence may have been, if any.

The second major concern is replicability. As long as one relies on a proprietary, closed-source model, scientific replication is only possible for as long as that specific model version remains accessible via API. This does not invalidate the scientific value of such work—science often progresses through iterative experimentation and occasional error—but it does highlight a fundamental limitation. We believe it is essential to clearly acknowledge this limitation, as it touches on the core of what makes a result verifiable and robust.

The model implementation, experimental procedures, and data analysis were all conducted using Python 3.11.3. The full model code used to generate the results’ experiments is available at the following https://anonymous.4open.science/r/LLM_sustainability_game-1C88/.

3. Results

Three distinct scenarios can be identified in all the experiments: 1 corresponds to the case in which all agents go extinct before reaching the end of the simulation; 2 describes the situation where a single agent is able to militarily defeat all others and emerge as the sole survivor; and 3 represents the outcome in which multiple agents successfully survive until the final time step.

3.1. Multiple strategies

median simulation length, suggesting faster collapse or convergence. The variability is higher at lower player counts, with a few outlier runs reaching very high survival times. As the number of players increases, the outcome becomes more tightly clustered around earlier termination points. This suggests that larger populations may accelerate competitive dynamics, leading to quicker system destabilization. Also, a greater population using a fixed amount of resources can consume them quicker.

Figure 3 illustrates both individual and aggregate agent behavior over time for a simulation that concludes with lifestock exhaustion. Panel 3a shows trajectories in resource accumulation across agents, highlighting divergence in success despite shared initial conditions. These diferences could be accounted both individuals’ and neighbors’ strategies. In panel 3b, the aggregate dynamics reveal a depletion of brown resources which is been mitigated towards the end, given that the agents adjust their action according to the prediction regarding the brown blocks. This can be seen also with the dominance of green blocks in the second part of the simulation, suggesting a systemic shift toward sustainable production. The number of agents decreases slightly over time, as the results of competition.

Figure 4 illustrates the system’s behavior under conditions leading to long-term sustainability (which is the 3 scenario). As shown in panel 4a, the individual agents rapidly accumulate green resources while black and red resources diminish early. In panel 4b, which depics the aggregate value of the blocks in the system, it is possible to observe that the total number of brown blocks increases over time, indicating a regenerative dynamic possibly due to restrained exploitation. This was possible also because the number of players stabilizes at two, suggesting early extinction of others – especially the one with more aggressive strategies – could lead to long-term coexistence between survivors. Decision patterns confirm a dominant reliance on green-to-green actions, aligning with the observed environmental recovery.

Table ?? reports summary statistics for the three outcome scenarios 1, 2, and 3 in the multiple strategies experiment. Scenario 3, associated with agent coexistence, features the lowest average number of links and the longest duration , suggesting less dense networks may promote sustainability. In contrast, 1 and 2—associated with extinction and domination—occur at higher link densities. The standard deviation in indicates greater variability in population outcomes under 1 and 2. These ifndings point to a potential trade-of between connectivity and system stability.

3.2. Two strategies

Figure 5b reveals a pattern consistent with that of Figure 2b, indicating that the average simulation length remains largely unafected by the reduction in available strategies from seven to two—at least for the specific strategies examined. In contrast, Figure 5a shows a marked change in individual outcomes: the average survival time for agents using the killer strategy remains stable, while that of the green strategy increases significantly. This improvement can be attributed to the higher likelihood of green agents encountering similarly non-aggressive neighbors, reducing the risk of early elimination.

Figure 6 illustrates a dynamic where both green and red strategies gain traction early, but only green sustains long-term growth in resource accumulation. Panel 6b shows that green blocks eventually dominate, while red and black decline or fluctuate. The number of players rapidly decreases, with only one surviving by turn ten, suggesting an unstable competitive environment that arrives to a trivial equilibrium. Brown resources steadily deplete, indicating over-exploitation, which does not lead to collapse only because a single player were able to win rapidly enough to avoid it.

(a) Individual agents’ black, green, and red block counts over time in an extinction ( 1) run. (b) Total number of blocks (of each color), the surviving agent count, and decision frequencies during 1.

(a) Individual agents’ black, green, and red block counts over time in an extinction ( 3) run. (b) Total number of blocks (of each color), the surviving agent count, and decision frequencies during 3. (a) Average survival time for “killer” vs. “green” strategies in the two‐strategy experiment.

Comparing the two tables, we observe that the average number of players is significantly higher in the two-strategy experiment than in the multiple-strategy one, particularly in scenario 3, suggesting that limiting strategic diversity may foster greater agent survival. Conversely, the average number of links is substantially lower in the two-strategy setup across all scenarios, indicating that sparser networks are associated with extended coexistence. The final simulation turn is relatively consistent across experiments, with slightly higher values in 3 for both settings. Overall, the reduction in strategic complexity appears to simplify interactions and promote more stable outcomes.

3.3. Single strategy

In this case, where all agents adopt the green strategy, only scenario 3 emerged across all 50 simulations. This outcome is far from trivial. Although agents can still attack one another under the green strategy, the dynamics suggest that the level of aggression remains insuficient to lead to complete elimination. Moreover, while agents do generate red blocks, the overall production does not appear to be high enough to exhaust the brown resources. This is supported by the distribution of brown blocks at the end of the simulation shown in Figure 7b, where no run results in zero remaining brown blocks, and several simulations end with a very high number of them.

3.4. No strategy

The final case is not a formal experiment in the same sense as the others, as it was not pre-designed within the same framework. However, we chose to include it after observing the influence of strategy on agent survival. Here, agents are assigned no predefined strategy at all, allowing us to observe the system’s behavior in the absence of structured decision-making. This approach not only complements the previous results—demonstrating conditions under which populations thrive with diferent strategies—but also serves as a preliminary step toward a kind of psychomatics: an exploration of the vital behavioral principles emerging from machines capable of cognitively complex tasks.

In this case, scenario 3 occurred in 38.78% of simulations, while the remaining runs resulted in scenario 1; notably, scenario 2 never emerged. That is, in the absence of predefined strategies, there was no instance in which a single agent succeeded in—or even attempted to—eliminate all others. This result is particularly intriguing: it suggests that when left unguided, agents driven by GPT-4o-mini may spontaneously find a balance both with each other and with the environment, at least with a greater probability. Of course, in the majority of cases, such balance does not arise. Table 4 ofers a possible (a) Individual agents’ black, green, and red block counts over time in an extinction ( 2) run. (b) Total number of blocks (of each color), the surviving agent count, and decision frequencies during 2. (a) Mean (±SD) of population size ( ), network connectivity (ℓ), and duration ( ) for each outcome in the two-strategy experiment.

(b) Histogram of remaining brown blocks at simulation end, showing that no run fully depleted the biosphere under the green-only strategy. explanation, consistent with earlier findings: surviving agent populations tend to have higher foresight capabilities and lower connectivity. This implies they are better at managing brown resources and less likely to engage in destructive interactions.

4. Discussion and conclusion

This paper revisits a sustainability game from the existing literature [ 15, 8 ], in which a long-term commons dilemma is set against short-term competitive dynamics among players. The original agent-based implementation is extended by replacing traditional agents with ones driven by LLMs. This shift serves a dual purpose: first, to gain deeper insight into how LLM-based agents behave when faced with scarce renewable and non-renewable resources under competitive pressure; and second, to explore the potential of using LLMs as the cognitive core of agents within ABMs. The second objective was easily achieved: the results obtained in this study are consistent—though not identical—with those reported in previous work modeling the same game. In particular, we observed that, as in the earlier study, the agents’ ability to process more information significantly increases their likelihood of survival. Additionally, the structure of the interaction network plays a critical role in determining which agents persist over time, shaping the mix of strategies that remains in the system. Finally, the ratio between available resources and the number of players also influences the system’s long-term sustainability, confirming its relevance as a key factor in the dynamics. Regarding the first objective, we found that under mixed-strategy conditions, there are only limited instances in which agents are able to collectively play and win a sustainability game that involves a tension between short-term competition and long-term resource preservation. However, the results also show that when agents are guided by an explicitly encoded strategy—defined at the level of the system prompt, and thus not as deeply embedded as through dedicated training—they can behave in a non-aggressive manner and avoid destroying the environment they inhabit. This suggests that even shallow behavioral priors, if well designed, can be suficient to steer agent populations toward more sustainable outcomes.

In this sense, we can conclude that the ability to manage a commons dilemma in a competitive setting does not depend solely on the intrinsic behavior of the LLM, but also on how it is explicitly instructed. However, the analysis of agent behavior without explicit strategies revealed a noteworthy result: even in the absence of predefined guidance, LLM-based agents were often capable of reaching the end of the simulation and adopting sustainable strategies. This outcome is far from trivial and raises important questions. Does it indicate that these models exhibit inherently sustainable behavior in dynamic environments? Or is it simply a consequence of their prior exposure to similar scenarios during training—for example, through scientific articles on sustainability games? While we partially addressed this by explicitly asking the model whether it recognized the game, the question remains open. What we can confidently take away from this study are two key insights. First, that an LLM, when left unguided, can spontaneously coordinate with other LLMs to achieve a sustainable collective behavior. Second, that this emergent sustainability is fragile: it holds only in the absence of explicit instructions, and can be overridden by a suficiently influential system prompt. This suggests that tests of this kind could serve as valuable tools for assessing the strategic alignment of LLMs in multi-agent contexts.

Finally, there is one additional aspect worth discussing. From the perspective of dynamic system modeling, one non-trivial challenge is assessing whether an LLM can act consistently within a dynamic environment—one that evolves over time and whose state depends on its own history. In this study, we observed that LLM-based agents, even without an extensive set of explicit instructions, were able to operate reasonably well in such contexts, at least in relatively simple scenarios. This finding is encouraging, as it suggests that LLMs may possess a degree of temporal coherence suficient for interacting with non-static environments. Nonetheless, further research is required to determine the level of system complexity and dynamism under which an LLM can still behave efectively and reliably. A possible extension of this work would be to assess whether the results hold when using other LLMs of comparable intelligence, thereby validating that the findings are not specific to OpenAI’s models [32]. Additionally, it would be valuable to investigate the minimum intelligence threshold below which an LLM can no longer efectively participate in the game—failing not only to pursue meaningful strategies but also to produce valid outputs. Further experimentation could also focus on the second cognitive dimension: the quantity and quality of information provided to the model. We identify a relationship between the collapse probability and the amount of information provided, but a more detailed analysis of how access to diferent types of input afects behavior could be performed [33]. Finally, the impact of specific strategies warrants deeper investigation, including the identification of potentially optimal strategy combinations and the emergence of collective phenomena from their interactions [34].

Acknowledgments

This publication is supported by the European Union’s Horizon Europe research and innovation programme under the Marie Skłodowska-Curie Postdoctoral Fellowship Programme, SMASH co-funded under the grant agreement No. 101081355. The operation (SMASH project) is co-funded by the Republic of Slovenia and the European Union from the European Regional Development Fund.

Declaration on Generative AI

The authors have employed Generative AI tools to support code writing, refine the language, and proofread the final version of the text. [19] S. Roman, F. Bertolotti, A master equation for power laws, Royal Society open science 9 (2022) 220531. [20] Y.-S. Chuang, A. Goyal, N. Harlalka, S. Suresh, R. Hawkins, S. Yang, D. Shah, J. Hu, T. T. Rogers, Simulating opinion dynamics with networks of llm-based agents, arXiv preprint arXiv:2311.09618 (2023). [21] Ö. Gürcan, Llm-augmented agent-based modelling for social simulations: Challenges and opportunities, HHAI 2024: Hybrid Human AI Systems for the Social Good (2024) 134–144. [22] F. Bertolotti, A. Locoro, L. Mari, Sensitivity to initial conditions in agent-based models, in: MultiAgent Systems and Agreement Technologies: 17th European Conference, EUMAS 2020, and 7th International Conference, AT 2020, Thessaloniki, Greece, September 14-15, 2020, Revised Selected Papers 17, Springer, 2020, pp. 501–508. [23] R. Occa, F. Bertolotti, et al., Understanding the efect of iot adoption on the behavior of firms: An agent-based model, in: CS & IT Conference Proceedings, volume 12, CS & IT Conference Proceedings, 2022. [24] S. Roman, F. Bertolotti, Global history, the emergence of chaos and inducing sustainability in networks of socio-ecological systems, Plos one 18 (2023) e0293391. [25] F. Bertolotti, F. Schettini, L. Ferrario, D. Bellavia, E. Foglia, A prediction framework for pharmaceutical drug consumption using short time-series, Expert systems with applications 253 (2024) 124265. [26] S. Roman, Historical dynamics of the chinese dynasties, Heliyon 7 (2021). [27] S. Roman, Theories and models: Understanding and predicting societal collapse, in: The Era of Global Risk: An Introduction to Existential Risk Studies, Open Book Publishers, 2023, pp. 27–54.

URL: https://doi.org/10.11647/OBP.0336.02. [28] N. Saporiti, V. Cannas, G. Pirovano, R. Pozzi, T. Rossi, Barriers and enablers to the implementation of digital twins in manufacturing companies: A literature review (2020), Proceedings of the Summer School Francesco Turco (2020). [29] E. Bonabeau, Agent-based modeling: Methods and techniques for simulating human systems,

Proceedings of the national academy of sciences 99 (2002) 7280–7287. [30] F. Bertolotti, N. Kadera, L. Pasquino, L. Mari, An epidemiological extension of the el farol bar problem, Frontiers in Big Data 8 (2025) 1519369. [31] C. M. Macal, M. J. North, Agent-based modeling and simulation, in: Proceedings of the 2009 winter simulation conference (WSC), IEEE, 2009, pp. 86–98. [32] F. Bertolotti, L. Mari, An llm-based delphi study to predict genai evolution, arXiv preprint arXiv:2502.21092 (2025). [33] F. Bertolotti, F. Schettini, F. Asperti, E. Foglia, A gravity model for emergency departments,

Scientific reports 15 (2025) 19537. [34] F. Bertolotti, R. Occa, “roads? where we’re going we don’t need roads.” using agent-based modeling to analyze the economic impact of hyperloop introduction on a supply chain, in: European Conference on Multi-Agent Systems, Springer, 2020, pp. 493–500.

[1]

Liu ,

Zhang ,

Li ,

Liu ,

Yang , Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization , arXiv preprint arXiv:2310.02170 ( 2023 ).

[2]

Vaswani ,

Shazeer ,

Parmar ,

Uszkoreit ,

Jones ,

A. N.

Gomez , Ł. Kaiser, I. Polosukhin , Attention is all you need , Advances in neural information processing systems 30 ( 2017 ).

[3]

Shen ,

Tenenholtz ,

J. B.

Hall ,

Alvarez-Melis ,

Fusi , Tag-llm: Repurposing general-purpose llms for specialized domains , arXiv preprint arXiv:2402.05140 ( 2024 ).

[4]

Piao ,

Yan ,

Zhang ,

Li ,

Yan ,

Lan ,

Lu ,

Zheng ,

J. Y.

Wang ,

Zhou , et al., Agentsociety: Large-scale simulation of llm-driven generative agents advances understanding of human behaviors and society , arXiv preprint arXiv:2502.08691 ( 2025 ).

[5]

Yang ,

Peng ,

Wang , W. Zhang, Multi-llm-agent systems: Techniques and business perspectives , arXiv preprint arXiv:2411.14033 ( 2024 ).

[6]

Sreedhar ,

Cai , J. Ma,

J. V.

Nickerson ,

L. B.

Chilton , Simulating cooperative prosocial behavior with multi-agent llms: Evidence and mechanisms for ai agents to inform policy decisions , in: Proceedings of the 30th International Conference on Intelligent User Interfaces , 2025 , pp. 1272 - 1286 .

[7]

Osaulenko ,

Yatsenko ,

Reznikova ,

Rusak ,

Nitsenko , et al., The productive capacity of countries through the prism of sustainable development goals: Challenges to international economic security and to competitiveness, Financial and credit activity problems of theory and practice 2 ( 2020 ) 492 - 499 .

[8]

Bertolotti , S. Roman, Balancing long-term and short-term strategies in a sustainability game , Iscience 27 ( 2024 ).

[9]

Bertolotti ,

Roman , Risk sensitivity of production studios on the us movie market: an agentbased simulation ., in: WOA , 2021 , pp. 210 - 223 .

[10]

Bertolotti ,

Roman , Risk sensitive scheduling strategies of production studios on the us movie market: An agent-based simulation , Intelligenza Artificiale 16 ( 2022 ) 81 - 92 .

[11] G. Hardin, The tragedy of the commons , Science 162 ( 1968 ) 1243 - 1248 .

[12]

C. W.

Clark ,

G. R.

Munro , The economics of fishing and modern capital theory: a simplified approach , Journal of environmental economics and management 2 ( 1975 ) 92 - 106 .

[13]

P. J.

Deadman , Modelling individual behaviour and group performance in an intelligent agentbased simulation of the tragedy of the commons , Journal of Environmental Management 56 ( 1999 ) 159 - 172 .

[14]

Chawla ,

Piva ,

Ahmed ,

Jia , I. Levy ,

S. W. C.

Chang , Individual decision-making underlying the tragedy of the commons , bioRxiv ( 2022 ). doi: 10 .1101/ 2022 .11.29.518377.

[15]

Bertolotti ,

Roman , The evolution of risk sensitivity in a sustainability game: an agent-based model ., in: WOA , 2022 , pp. 101 - 115 .

[16]

Piatti ,

Jin ,

Kleiman-Weiner ,

Schölkopf ,

Sachan ,

Mihalcea , Cooperate or collapse: Emergence of sustainability behaviors in a society of llm agents , arXiv preprint arXiv:2404.16698 ( 2024 ).

[17] R. M. Turner , The tragedy of the commons and distributed AI systems , Department of Computer Science, University of New Hampshire Durham, NH , 1993 .

[18]

Saporiti ,

Strozzi , T. Rossi, Digital twin relationship with virtual reality and augmented reality: a bibliometric review , Proceedings of the Summer School Francesco Turco ( 2021 ).