Towards Intelligent Pulverized Systems: a Modern Approach for Edge-Cloud Services⋆ Davide Domini1 , Nicolas Farabegoli1 , Gianluca Aguzzi1 and Mirko Viroli1 1 Università di Bologna – ALMA MATER STUDIOURM, Via Dell’Università 50, 47521 Cesena, Italy Abstract Emerging trends are leveraging the potential of the edge-cloud continuum to foster the creation of smart services capable of adapting to the dynamic nature of modern computing landscapes. This adaptation is achievable through two primary methods: by leveraging the underlying architecture to refine machine learning algorithms, and by implementing machine learning algorithms to optimize the distribution of resources and services intelligently. This paper explores the latter approach, focusing on recent advancements in pulverized architecture, collective intelligence, and many-agent reinforcement learning systems. This novel trend, which we refer to as intelligent pulverized system (IPS), aims to create a new generation of services that can adapt to the complex and dynamic nature of the edge-cloud continuum. Our proposed learning framework integrates many-agent reinforcement learning, graph neural networks, and aggregate computing to create intelligent services tailored for this environment. We discuss the application of this framework across different levels of the pulverization model, illustrating its potential to enhance the adaptability and efficiency of services within the edge-cloud continuum. Keywords Edge Cloud Continuum, Many-Agent Reinforcement Learning, Pulverization 1. Introduction Recent technological developments are fostering a computational landscape that is increasingly articulated and complex across various levels. The historical distinction between cloud, fog, and edge computing is becoming progressively blurred, giving rise to what is known as the edge-cloud continuum (ECC) [1, 2]. This shift is driven by the necessity for services developed on these platforms to be highly opportunistic, capable of dynamically moving up and down the continuum based on local needs and computational requirements. The ECC is a valuable resource for creating intelligent services that leverage recent advance- ments in machine learning [3]. These services must be able to utilize the continuum’s full WOA 2024: 25th Workshop "From Objects to Agents", July 8-10, 2024, Forte di Bard (AO), Italy * Corresponding author. † These authors contributed equally. $ davide.domini@unibo.it (D. Domini); nicolas.farabegoli@unibo.it (N. Farabegoli); gianluca.aguzzi@unibo.it (G. Aguzzi); mirko.viroli@unibo.it (M. Viroli) € https://www.unibo.it/sitoweb/davide.domini/en (D. Domini); https://www.unibo.it/sitoweb/nicolas.farabegoli/en (N. Farabegoli); https://www.unibo.it/sitoweb/gianluca.aguzzi/en (G. Aguzzi); https://www.unibo.it/sitoweb/mirko.viroli/en/ (M. Viroli)  0009-0006-8337-8990 (D. Domini); 0000-0002-7321-358X (N. Farabegoli); 0000-0002-1553-4561 (G. Aguzzi); 0000-0003-2702-5702 (M. Viroli) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings potential by adapting to the changing conditions and demands of the environment. Furthermore, making the continuum itself more intelligent is essential for developing complex applications, particularly those that are collective in nature. Modern applications, such as those in perva- sive [4], collective [5], and ubiquitous [6] computing, often require collective computations rather than mere local computations. Examples include applications for crowd control, traffic management, and energy monitoring [7, 8]. These scenarios highlight the need for a holistic view of the system rather than just local optimizations. This paper proposes a new vision of intelligent pulverized system (ISP) for creating a smarter ECC by leveraging recent developments in many-agent reinforcement learning [9], pulveriza- tion [10, 11], and macroprogramming [12]. Pulverization is a modern approach that describes collective computations capable of being “broken down” and distributed across multiple hosts, which is crucial for fully exploiting the continuum [10]. Additionally, we explore how macropro- gramming approaches can effectively capture the collective aspect of these systems, providing a comprehensive framework for intelligent service development within the ECC [12]. This structured approach aims to enhance the intelligence of the ECC, enabling the development of sophisticated, adaptive, and collective applications that can efficiently utilize the continuum’s capabilities. The rest of the paper is structured as follows: Section 2 provides an overview of the edge- cloud continuum, its characteristics, and the concept of pulverization; Section 3 discusses the integration of many-agent reinforcement learning, GNNs, and aggregate computing to develop intelligent services; Section 4 details the application of intelligent collective services to various levels of pulverization, including intelligent reconfiguration, adaptive communication, and scheduling; and Section 5 summarizes our findings and outlines potential future directions for research in this area. 2. Background 2.1. Edge-Cloud Continuum The rapid adoption of cloud technologies was driven by the benefit of having high availability, and scalability of computational resources. The rise of cloud computing has led to the prolifer- ation of new applications involving smart devices; these applications are typically reified in the Internet of Things (IoT) [13] and Cyber-Physical System (CPS) [14] domains. To meet the increasing demand of low-latency and high-throughput of such applications, the edge comput- ing [15] paradigm has been introduced, to bring computational resources closer to end users, thus reducing latency and network traffic. In this context, the highly distributed nature of such applications and the vast amount of generated data have driven the shift from a centralized cloud computing paradigm to a more distributed model, where the cloud is integrated with the edge. Such integration gave rise to a new, hybrid paradigm, called edge-cloud continuum [1, 2]. Depending on the specific scenario and scope, the Edge-Cloud Continuum (ECC) can have slightly different meanings [2]. Generally, the ECC refers to a distributed computing environment that extends from the cloud to the edge, where this continuum can be effectively leveraged to optimize metrics and quality of service. This new paradigm is characterized by a high level of heterogeneity in terms of devices, networks, and services. Consequently, various types of hardware devices are part of the continuum, ranging from high-end servers to wearable or embedded devices equipped with microcontrollers. Ideally, any devices equipped with a computational unit and a network connection can be part of the continuum. This paradigm includes diverse hardware and software stacks (e.g., Linux, embedded firmware, Android Wear, Docker), and various network technologies (e.g., Ethernet, WiFi, ZigBee, LoRa, 5G/6G), making the ECC a highly heterogeneous and dynamic environment, challenging to manage in terms of resource allocation and service deployment. While the continuum offers new possibilities to optimize the performance of applications, the complexity of the environment makes it difficult to fully exploit the full potential of the ECC, fostering the need for novel intelligent solutions. The rapid increase in data volumes from various applications is driving the evolution of distributed digital infrastructures for data analytics and Machine Learning (ML). Traditionally reliant on cloud infrastructures, the need for low-latency and secure processing has shifted some processing to IoT edge devices, finding in the ECC a prominent solution for the distributed intelligence [3]. 2.2. Macroprogramming When dealing with large-scale systems, such as ECC, it is helpful to shift the focus from individual devices to the collective system, as the behavior of the system as a whole is more important than the behavior of individual devices. Consider, for example, a crowd congestion alarm system. In this case, each smartphone could be part of an ECC system aimed at identifying crowded areas and intelligently guiding the crowd to disperse, thereby avoiding emergencies. Programming such behavior with a device-centric view is complicated because the collective paradigm is not incorporated at that level. In this direction, a modern area of research aims to change the programmer’s focus from the device to the aggregate. This area of research is called macroprogramming [12], which has its roots in Wireless Sensor Networks (WSN) [16] and has evolved to be used in opportunistic edge computing contexts today [17]. The core of macroprogramming lies in identifying a collective abstraction that then becomes a first-class citizen of the programming language. Various languages have been defined in this scenario, but most of them are designed for specific applications (e.g., PyoT for IoT [18] and Buzz [19] for robotics). A modern approach to macroprogramming is aggregate computing [20], which is a functional top-down global-to-local paradigm that allows defining collective behaviors through the manipulation of a distributed data structure called computational fields. This macro view is then broken down into local executions of individual devices, which, by computing iteratively and continuously, achieve the specified collective behavior. Aggregate computing has been used in the context of ECC to create additive behaviors based on devices’ location in space [12]. In this paper, however, we will focus on using this collective abstraction to better guide machine learning in the ECC context. Indeed, aggregate computing was shown to be a powerful tool to guide machine learning in the context of large-scale distributed systems [21, 22, 23]. Collective Component Collective Neighbor Devices Layer Pulverised Component Logical Device 𝜒 𝛽 𝜅 Logical 𝛽 𝜒 𝜎 𝛼 Logical Device Layer 𝜅 Physical 𝜎 𝛼 𝜒 𝛽 Device 𝜅 𝜎 𝛼 Physical Layer Virtual Link Physical Link Figure 1: Representation of the different levels of a pulverized system. On the left side, the three abstraction levels: the logical layer, the physical layer, and the infrastructure layer. On the right side, the logical device pulverized into the five subcomponents. 2.3. Pulverization Pulverization [10, 11] is an approach to simplify the design and deployment of (collective) distributed applications, by devising a peculiar partitioning into independently deployable components. The main goal is to provide the developer with a way to specify the application logic in a deployment-agnostic way, and let the platform/middleware take care of the deployment and the communication between the components. In this context, this approach mainly targets the deployment of collective applications, where through macroprogramming, the application is designed as a composition of collective components (collective layer depicted in Figure 1). To achieve this, the pulverization model defines two main abstraction levels: the logical layer and the physical layer, as depicted in Figure 1. The former is the level at which the developer reasons about the application logic, while the latter is the level at which the specified application is deployed and executed. System Model at the logical level, the developer specifies the application as an ensemble of application-specific devices forming an arbitrary topology. An application-specific device can be “pulverized” if it can be decomposed into pulverized components representing: a set of sensors (𝜎), a set of actuators (𝛼), a state holding the device’s knowledge (𝜅), a communication component to interact with other devices (𝜒), and a behavior component defining the device’s logic (𝛽). The Figure 1 shows a logical device pulverized into the five subcomponents (right side of the figure). Typically, the logical layer is composed of several (hundreds or thousands) devices forming a dynamic graph. Each logical device has a neighborhood with which it can interact. Such neighbourhoods can vary over and the way the neighborhood is defined can be application- specific (like proximity-based of based on physical connections). In pulverized system, the 𝜒 component is in charge of managing the communication between the devices, implementing the neighborhood definition and the message exchange. Execution Model the components interact with each other to perform a MAPE-K like loop: the sensors (𝜎) collect data from the environment and the neighbour’s messages are received by the communication component (𝜒), the behavior component (𝛽) applies the logic taking as input the state (𝜅) and the data from the sensors (𝜎), and the neighbor’s messages. The produced output contains the new state (𝜅), the messages to be sent to the neighbors via the communication component (𝜒), and the prescriptive actions to be executed by the actuators (𝛼). Notably, the flexibility of this model allows the implementation of such an interaction loop in many ways. The most common way to implement this interaction is via a round-based execution model, where each component at fixed intervals sends and/or receives messages from the other components. However, more complex implementation can be devised, like a pure reactive model where the components react to the messages received without a fixed schedule; in this case, the 𝛽 component leads (intelligently) the interaction loop. Another way to implement such a loop is to adopt a “best-effort” approach, where all components requiring input from the other components compute their logic against the most recent available data. Flexible pulverized Deployments for the ECC The pulverization approach is particularly suitable for collective applications, enabling non-trivial deployments. When macroprogramming like aggregate computing [20] are used, the pulverization model naturally integrates with them, simplifying development, especially in ECC infrastructures. Otherwise, the pulverization can be exploited to design custom systems with the pulverization in mind. Whenever an application is pulverized, a deployment mapping must be provided between the logical and the infrastructure levels. Typically, in this mapping, a logical device is typically deployed on either multiple physical devices or a single one. A logical device is often associated with an application-level physical device that needs to be managed or controlled (e.g., a sensor, a drone, or a person equipped with wearables). However, the capabilities of that physical device might not be enough to host all the components, and the pulverization might require the deployment of the components on different physical devices, namely infrastructure-level devices. The main advantage of pulverization is enabling many deployments without affecting or changing the application logic. Thanks to this feature, a system can be deployed on arbitrary infrastructural devices as long as they provide the required capabilities to host the components (for example, a sensor component can be executed only on devices hosting the appropriate sen- sors, and the behavior component can be allocated on that host having sufficient computational power). In this sense, the ECC can be exploited to opportunistically allocate the components making possible deployments otherwise impossible. 2.4. Machine Learning for ECC The emergence of the edge-cloud continuum offers significant potential for the development of new intelligent systems, particularly for applications that rely on machine learning techniques. In the following, we provide the current state of the art in the field of machine learning for the edge-cloud continuum. ECC for ML Applications ECC has attracted significant attention from the ML community: highly distributed networks composed of devices capable of generating a large amount of data can provide new resources to enhance the quality of ML models. However, these devices often have limited computing resources and latency constraints that prevent both local learning and offloading learning tasks to cloud servers. Conversely, the ECC provides a great platform with edge servers that add computational resources close to where the data is generated. In recent years, several studies in the literature have proposed leveraging these new resources, for instance: i) Real-time video streaming analysis for applications such as traffic control, surveillance, security, and real-time object detection [24, 25, 26]; ii) Smart vehicular systems for pedestrian recognition and crash detection [8, 27]; and iii) Smart cities for energy management and fault detection [7, 28]. Machine Learning for ECC machine learning can be exploited to optimize the deployment of services in the ECC, since leveraging its potential in real-life scenarios is a non-trivial task. The devices are heterogeneous and have multiple constraints (e.g., energy, latency, computational power). For this reason, it is essential to adapt the deployment dynamically and opportunisti- cally. In the literature, several approaches based on heuristics and meta-heuristics have been proposed [29, 30, 31]; nonetheless, these solutions are crafted for specific use cases and do not adapt over time. In recent years, solutions based on supervised learning techniques [32, 33, 34] have been proposed to predict the characteristics of microservices in order to find solutions that meet various constraints such as quality of service and energy consumption. This approach requires a vast amount of data that represents the various possible states of the system and the possible actions to be taken, which is challenging given the complexity of these systems. For these reasons, reinforcement learning techniques are emerging as a natural approach. This paradigm allows the optimization of the decision-making process by observing the evolution of the system over time, without the initial need to collect a large amount of data. In [35] the authors introduce a solution based on deep reinforcement learning (specifically using a Dueling DQN model [36]) to optimize the deployment of microservices. Initially, the global state of the system is observed (i.e., CPU usage, memory, and network bandwidth). These observations are then passed to the neural network, which predicts end-to-end latency and peak throughput for each possible action, thereby defining a new resource partitioning. The work in [37], instead, proposes a solution based on distributed deep reinforcement learning (using an actor-critic model) to address the task offloading problem, i.e., deciding which available device will execute a certain task. To achieve this, tasks are divided into three categories, namely: i) delay-sensitive tasks; ii) energy-sensitive tasks; and iii) insensitive tasks. At this point, each device runs its actor network to decide whether a task can be executed locally, on an edge server or in the cloud. Finally, a centralized critic network evaluates the actions taken by the various devices, providing feedback on their effectiveness. In addition to the aforementioned studies, other works leverage reinforcement learning for various optimization tasks in the ECC. For instance, [38] employs offline RL for task scheduling, [39] uses RL to optimize data caching in edge servers, and [40] applies RL to enhance network resource allocation. 3. Intelligent Collective Services for ECC The pulverization model proposes logical systems in which a series of logical entities form a dynamic graph that exposes the following properties: • large scale: an ECC system can be extensively scaled to encompass thousands of devices; • locality: edge devices are spatially located, and their behavior may depend on their location; • partial observability: edge devices do not have a perception of the entire collective but can perceive a certain neighborhood or aggregated information; • heterogeneity: devices belonging to ECC are highly heterogeneous due to their diverse nature. Applying standard supervised learning techniques to ECC services is challenging because it is difficult to determine the correct behavior a priori, given the highly dynamic nature of these systems. Therefore, considering these properties, we argue that a combination of many- agent reinforcement learning [9] (to encode large-scale dynamics), graph neural networks [41] (to encode spatial relationships), and aggregate computing (to encode collective feedback and observations) can be a suitable approach for developing intelligent services for ECC. In particular, many-agent reinforcement learning plays a crucial role in ECC, as it enables scalable policy learning and influences behavior through delayed collective feedback. In the following, we introduce both many-agent reinforcement learning and graph neural networks, and then discuss the proposed algorithm for developing intelligent services for ECC. 3.1. Formalization Many-Agent Reinforcement Learning is an extension of reinforcement learning in which, instead of having a single learning agent, there is an entire family of intelligent agents that learn concurrently. They are referred to as “many-agent” to distinguish them from “multi-agent”, as the number of devices can be extremely large, potentially involving thousands of devices. Formally, a many-agent system can be modeled through a family of agent A = (𝒮, 𝒪, 𝒜, ℛ, 𝜋) that lives and interact in an environment ℰ = (𝒫, A, 𝒯 , 𝜉) [42]. Particularly, the agent A is defined by: • 𝒮, 𝒪, and 𝒜 represent the sets of local states, observations, and actions, respectively. • ℛ : 𝒮 → R is the reward function, influenced by the environment. • 𝜋 : 𝒪 → 𝒜 is the policy mapping observations to actions, which can be deterministic or stochastic. Where the environment ℰ is defined by: • 𝒫 is the fixed population of agents. • A is the agent prototype governing each agent in 𝒫. • 𝒯 : 𝒮 × 𝒜𝒫 × 𝒮 → R𝒫 is the global transition function influenced by agent actions, yielding a collective reward. • 𝜉 : 𝒮 → 𝒪𝒫 is the global observation model. The system’s evolution over time can be captured as follows: 𝒜𝒫 𝑡 = 𝜋(𝜉(𝒮𝑡 )), 𝒮𝑡+1 = 𝒯 (𝒮𝑡 , 𝒜𝑡 ) At any given time 𝑡, the system can be represented as a graph 𝐺𝑡 = (𝑉 𝑡 , 𝐸 𝑡 ), where 𝐸 𝑡 is constructed from a neighborhood relationship. Each node has an associated local observation 𝑜𝑣𝑡 ∈ 𝒪. This graph representation is crucial for both computing computational fields and serving as input for a GNN, as demonstrated in previous works [43, 44, 45, 21]. Graph Neural Network is a novel neural network model designed to process graph- structured data [41]. Given a graph 𝐺 = (𝑉, 𝐸), where 𝑉 is the set of nodes and 𝐸 ⊆ 𝑉 × 𝑉 represents the edges, each node 𝑣 ∈ 𝑉 has an associated feature set 𝑓𝑣 . The aim of a GNN is to learn a node embedding ℎ𝑣 for each node 𝑣 ∈ 𝑉 by aggregating information from its neighbors 𝒩𝐺 (𝑣) through a process known as message passing [46]. The process involves three main steps in each layer 𝑘 of the GNN: (︁ )︁ m(𝑘) 𝑢𝑣 = 𝜓 (𝑘) h(𝑘−1) 𝑢 , h(𝑘−1) 𝑣 , 𝑒𝑢𝑣 (1) (𝑘) (︁ )︁ ⨁︁ a(𝑘) 𝑢 = {m(𝑘)𝑢𝑣 : 𝑣 ∈ 𝒩𝐺 (𝑢)} (2) (︁ )︁ h(𝑘) 𝑢 = 𝜑 (𝑘) h(𝑘−1) (𝑘) 𝑢 , a𝑢 (3) (𝑘) where m𝑢𝑣 is the message from node 𝑢 to node 𝑣 at layer 𝑘, denotes the aggregation ⨁︀ function, and 𝜑 is the update function. Initially, ℎ𝑣 is set to the node’s feature vector 𝑓𝑣 . (𝑘) 0 GNNS enables the effective processing of graph-structured data by iteratively updating node embeddings, capturing the local graph structure around each node. This formulation facilitates the application of GNNs in various tasks within the ECC, where the spatial relationships between devices are critical. GNN (𝐺𝑓 ) = {h(𝑘) 𝑢 : 𝑢 ∈ 𝑉, 𝑘 ∈ N} (4) Where 𝐺𝑓 is the graph with features 𝑓𝑣 for each node 𝑣 ∈ 𝑉 , formally: 𝐺𝑓 = (𝑉, 𝐸, {𝑓𝑣 : 𝑣 ∈ 𝑉 }). By leveraging the unique capabilities of GNNs, it is possible to develop more sophisticated and adaptive services that can dynamically respond to the complex and distributed nature of the ECC. 3.2. Contribution In this section, we discuss an approach for performing many-agent reinforcement learning leveraging GNN in pulverized systems. In this context, as in similar many-agent scenarios, we apply the pattern known as centralized learning and distributed execution [47]. This is because, at a large scale and with only partial observability, having a global view of the task helps in better learning collective tasks. Learning dynamics We base our approach on the model of pulverization, learning under the assumption that there exists a set of components that can act according to local logics – here, we are not concerned with the type of actions. Specifically, as described in Algorithm 1, for each time step 𝑡, a graph 𝐺𝑡 represents the connectivity between devices. Each node in the graph corresponds to a local device, and edges represent the communication or influence between devices. Each node 𝑖 ∈ 𝐺𝑡 , has an associated observation 𝑜𝑡𝑖 creating a decorated graph 𝐺𝑡𝑜 that represents. This same graph can be passed to a macro program 𝑃 (e.g., aggregate computing) to encode global system information (e.g., areas of high consumption in the case of energy distribution, areas of the system more allocated for task distribution). The macro program aggregates local information to provide a global perspective, which is crucial for the GNN to make informed decisions. Formally, an evaluation of a macro program 𝑃 with a graph 𝐺𝑡𝑜 produces a new graph 𝐺𝑡𝑚 , where for each node, associate the previous local state and the evolution of the macro program: 𝑚𝑡𝑖 = (𝑜𝑡𝑖 , 𝑃 (𝐺𝑡𝑜 )). This graph is then passed to a GNN 𝑡 that will compute the actions a𝑡𝑖 ∈ 𝒜𝒫 to be taken. These actions will be executed in the environment (i.e., the various logical components will execute what they need to), transitioning to a new collective graph 𝐺𝑡+1 𝑜 and obtaining a reinforcement signal R𝒫 . The reinforcement 𝑡 signal R𝒫 can be split into local and global components. The local component is derived from individual device metrics (e.g., battery consumption), while the global component is computed from aggregate metrics (e.g., average consumption in a region) that can be also computed by another macro program 𝑃 R . Algorithm 1 GNN-Based many-agent reinforcement learning in ECC 1: Initialize: Local device observations o0 2: for each time step 𝑡 do 3: Create decorated graph 𝐺𝑡𝑜 with observations o𝑡 4: Apply macro program 𝑃 to 𝐺𝑡𝑜 to produce aggregate observations 𝐺𝑡𝑚 5: Input 𝐺𝑡𝑚 into GNN to compute actions a𝑡𝑖 ∈ 𝒜𝑡𝑖 6: Execute actions a𝑡 in the environment 7: Transition to new graph 𝐺𝑡+1𝑜 and obtain reinforcement signal R𝑡 8: Calculate local and collective reinforcement signals 9: Update GNN with new graph 𝐺𝑡+1 𝑜 , observations o 𝑡+1 , and reinforcement signals R𝑡 10: end for Implementation discussion This section proposes some hints for the future concrete im- plementation of the algorithm described above and in Algorithm 1. First, it is worth noting that the proposed solution is completely agnostic to the choosen many-agent reinforcement learning algorithm. The only constraint is that such algorithm must support the centralized learning and distributed execution model. On the one hand, a classic approach is to implement the DQN algorithm [48] to approximate a function that, for each observation, provides the most valuable action (i.e., value based methods). On the other hand, policy based algorithms exists. For instance, PPO [49] is based on a simple actor and critic method. In this framework, each agent has an actor network which is used to select the current action, while a central entity has a critic network used to reward each action. Thus, this method fits the centralized learning and distributed execution contraint. Moreover, a study shows the effectiveness of PPO in multi-agent cooperative settings [50]. To avoid issues, such as the curse of dimensionality and the exponential growth of interactions between agents, related to the large number of agents present in the reference context, the described approaches can be integrated with modern solutions that make learning more stable. For example, mean-field reinforcement learning [51] integrates both DQN and actor-critic approaches by ensuring that each agent is influenced not by the individual actions of neighbors but by their average. Additionally, also the GNN can be distributed in the system in such a way that it can be executed in a distributed manner without the need for a central cloud [43]. Furthermore, thanks to the new self-organizing approach of the self-organising coordination regions [52] pattern, these learning points are not necessarily known a priori and can also change over time, therefore enabling continual learning [53]. This distributed approach is further enhanced by self-organizing patterns, allowing for dynamic adaptation and continual learning in changing environments. 4. Intelligent Pulverized System In this section, we explore how the concept of intelligent collective services can be applied to various levels of pulverization, building upon the proposed model to enhance system efficiency and adaptability. In particular, in the following, we focus on three main aspects: (i) reconfig- uration: how reconfiguration policies can be used to optimize the deployment of pulverized components at runtime; (ii) communication: how communication protocols can be learned to optimize the exchange of information between devices; and (iii) scheduling: how adaptive scheduling policies can be used to optimize the execution of pulverized components. 4.1. Reconfiguration Discussion In the ECC, reconfiguration is a key component. Specifically, with pulverization, reconfiguration allows for the offloading of one or more components of a logical device (right side of Figure 1) from one host to another to meet certain constraints. These constraints can be expressed through reconfiguration rules, which can be either local (e.g., the host’s battery drops below a certain threshold) or global (e.g., defined via aggregate computing) to preserve global coherence and prevent oscillatory conditions. While these approaches can be effective in simple scenarios, they have limitations in more complex situations: (i) they can only represent relatively simple rules; (ii) it is challenging to define all rules a priori, as they can be numerous and complex; and (iii) the rules are fixed, so if the system changes over time, it cannot adapt. Smart Intelligent Communication Scheduling 𝛽 𝜒 𝜅 Intelligent Components Reconfiguration 𝜎 𝛼 Figure 2: Representation of the main parts in which intelligence can support the pulverization model. Intelligent Scheduling manage the intelligent execution of the behavior, Smart Communication supports the 𝜒 component in efficient communication and neighborhood definition, and Intelligent Components Reconfiguration manage the Intelligent and opportunistic relocation of the pulverized components, improving the system’s performance. Vision we believe that an interesting direction is to learn these rules through the proposed many-agent reinforcement learning approach. Specifically, using online reinforcement learning techniques can help manage potential changes or specific cases that were not captured before- hand. In the literature, some works have attempted to explore this field [35, 54, 55]. However, they focus on entire microservices. Our approach offers the following advantages: 1. it is possible to operate more granular, not being forced to relocate the entire logical device, but rather being able to act at the level of individual components; 2. using macroprogramming and graph neural networks, it is possible to integrate neigh- borhood information, thereby enriching the knowledge of individual devices regarding the global state and objective of the system; 3. it is possible to define complex reconfiguration rules and to learn new rules not known at design time. 4.2. Communication Discussion in collective systems, communication represents a crucial aspect: thanks to device communication and coordination, a global common goal can be achieved. In this sense, the 𝜒 component of the pulverization model is responsible for managing the neighborhood- based communication between devices. For this reason, understanding what, when, and with whom communicate is crucial for effective and efficient coordination. Consequently, as shown in Figure 2, supporting the 𝜒 component with intelligent communication can be a key aspect to improve the system performance. Learning communication protocols is a well-known problem in the literature. Initially, approaches based on heuristics and meta-heuristics were proposed [56, 57]. However, they often fail to guarantee scalability and adaptability over time, which are essential qualities for systems operating in dynamic and large-scale environments. To overcome these limitations, there has been a significant shift towards utilizing multi-agent reinforcement learning techniques. MARL has shown promise in addressing the complexities associated with learning and optimizing communication protocols among multiple agents [58, 59]. In addition to MARL, Graph Neural Networks have been employed to capture relationships between neighboring agents [60, 61]. Vision we believe the proposed many-agent reinforcement learning approach, combined with GNNs, holds great promise for learning communication protocols in the context of pulverized systems. This approach allows for: 1. optimizing the amount of information exchanged in the network; 2. optimizing the set of neighbors to which a message is sent at a given time step; 3. optimizing the message exchange frequency. Thus, avoiding large bandwidth consumption and energy waste for individual devices. 4.3. Scheduling Discussion in the pulverization model, no fixed scheduling policy is generally predefined for calculating the new state (i.e., executing the behavior). This approach allows developers considerable flexibility in choosing how and when to execute the primary behavior. The scheduling policy must be adaptable to the underlying infrastructure layer. For instance, when computations are entirely offloaded to the cloud, there are no significant power constraints. In contrast, when computations are performed on mobile devices like smartphones, power consumption becomes a critical factor. Beyond simple local rules (e.g., reducing evaluation frequency when battery power is low), scheduling may also be influenced by the collective results of ongoing computations. For example, in a crowd scenario, if the situation becomes less crowded, the overall frequency of evaluations may be reduced. Early work in this area, such as programmable distributed scheduling leveraging aggregate computing [62], involved defining scheduling rules based on the collective outcome of computations. However, these approaches assumed a fixed scheduling policy that did not adapt to changing environmental dynamics. Recent studies [23] have shown that simple variations of Q-learning can improve collective computation by converging more rapidly to collective structures. However, these methods relied on centralized learning and Q-tables, making them unsuitable for scaling to the complexity of diverse scenarios. Vision our vision of embracing many-agent reinforcement learning and graph networks offers several advantages: 1. creating distributed global scheduling independent of the number of devices; 2. developing distributed policies influenced by global information or larger neighborhoods; 3. enabling more complex scheduling functions using neural networks instead of simple tables. We believe this is a promising direction, as recent work has successfully applied this approach to specific task allocation problems [63]. Applying it to the pulverization model would enable its use in various collaborative scenarios, opening the possibility of creating services that intelligently optimize certain Quality of Service (QoS) metrics based on the collective layer proposed by the model. 4.4. Applicability Our approach comprehensively addresses the challenge of deploying and dynamically reconfig- uring applications in the ECC injecting intelligence in these processes (cf. Section 4). Typically, application scenarios in the context of the IoT, edge computing, and swarm robotics can take advantages by their deployment in infrastructures like the ECC, improving both functional and non-functional aspects. For instance, in smart city and wearable technology scenarios, our system offers a dual benefit: extending battery life and minimizing the overall power consumption of deployed systems. Intelligent reconfiguration policies can be automatically devised and inferred to better relocate the component’s execution over the infrastructure, extending the device’s battery life and reducing the 𝐶𝑂2 𝑒 of the system, otherwise difficult or even impossible with traditional approaches like optimization algorithm or rule-based systems (cf. Section 4.1). Similarly, determining the optimal execution policy for each component involves a complex interplay of factors, including the target device, environmental conditions, and specific application constraints. All these aspects may (and usually do) change frequently over time making it difficult to predict a priori. In a rescue scenario, our intelligent system can dynamically adapt to the emergency by increasing the computational frequency of the nearby devices, providing more updated status about the emergency conditions, while reducing the computational frequency on the farthest devices not close to the hot spot (cf. Section 4.3). Additionally, via the proposed approach, intelligent communication patterns can be automatically envisioned determining where the communication should occur (i.e., the physical place where the communication is implemented), and who this communication should involve to effectively transfer the minimal but effective information for the rescuers to promptly intervene in the emergency area (cf. Section 4.2). 5. Conclusion In this paper, we discussed a vision for creating and deploying intelligent services within the Edge-Cloud Continuum (ECC). Specifically, we proposed an idea called intelligent pul- verized systems (IPS), which combines the pulverization model with graph-based networks and many-agent reinforcement learning. This solution possesses the necessary characteristics to adapt to modern ECCs, including scalability, the ability to encode collective information, and the definition of components in an architecture-agnostic manner. We believe that this approach is essential for maximizing the potential of ECCs to create opportunistic and intelligent applications. In the near future, we plan to leverage state-of-the-art many-agent (like mean-field reinforce- ment learning [51]) algorithms to effectively validate this approach in real-world scenarios, such as smart cities and beyond. By doing so, we aim to demonstrate the practical applicability and benefits of ISP in dynamically changing environments. References [1] D. Khalyeyev, T. Bures, P. Hnetynka, Towards characterization of edge-cloud con- tinuum, CoRR abs/2309.05416 (2023). URL: https://doi.org/10.48550/arXiv.2309.05416. doi:10.48550/ARXIV.2309.05416. arXiv:2309.05416. [2] S. Moreschini, F. Pecorelli, X. Li, S. Naz, D. Hästbacka, D. Taibi, Cloud continuum: The definition, IEEE Access 10 (2022) 131876–131886. URL: https://doi.org/10.1109/ACCESS. 2022.3229185. doi:10.1109/ACCESS.2022.3229185. [3] D. Rosendo, A. Costan, P. Valduriez, G. Antoniu, Distributed intelligence on the edge-to- cloud continuum: A systematic literature review, J. Parallel Distributed Comput. 166 (2022) 71–94. URL: https://doi.org/10.1016/j.jpdc.2022.04.004. doi:10.1016/J.JPDC.2022.04. 004. [4] M. Satyanarayanan, Pervasive computing: vision and challenges, IEEE Wirel. Commun. 8 (2001) 10–17. URL: https://doi.org/10.1109/98.943998. doi:10.1109/98.943998. [5] G. D. Abowd, Beyond weiser: From ubiquitous to collective computing, Computer 49 (2016) 17–23. URL: https://doi.org/10.1109/MC.2016.22. doi:10.1109/MC.2016.22. [6] M. Friedewald, O. Raabe, Ubiquitous computing: An overview of technology impacts, Telematics Informatics 28 (2011) 55–65. URL: https://doi.org/10.1016/j.tele.2010.09.001. doi:10.1016/J.TELE.2010.09.001. [7] D. Park, S. Kim, Y. An, J. Jung, Lired: A light-weight real-time fault detection system for edge computing using LSTM recurrent neural networks, Sensors 18 (2018) 2110. URL: https://doi.org/10.3390/s18072110. doi:10.3390/S18072110. [8] W. Chang, L. Chen, K. Su, Deepcrash: A deep learning-based internet of vehicles system for head-on and single-vehicle accident detection with emergency notification, IEEE Access 7 (2019) 148163–148175. URL: https://doi.org/10.1109/ACCESS.2019.2946468. doi:10.1109/ ACCESS.2019.2946468. [9] Y. Yang, Many-agent reinforcement learning, Ph.D. thesis, University College London (University of London), UK, 2021. URL: https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos. 830054. [10] R. Casadei, D. Pianini, A. Placuzzi, M. Viroli, D. Weyns, Pulverization in cyber-physical systems: Engineering the self-organizing logic separated from deployment, Future Internet 12 (2020) 203. URL: https://doi.org/10.3390/fi12110203. doi:10.3390/FI12110203. [11] R. Casadei, G. Fortino, D. Pianini, A. Placuzzi, C. Savaglio, M. Viroli, A methodology and simulation-based toolchain for estimating deployment performance of smart collective services at the edge, IEEE Internet Things J. 9 (2022) 20136–20148. URL: https://doi.org/10. 1109/JIOT.2022.3172470. doi:10.1109/JIOT.2022.3172470. [12] R. Casadei, Macroprogramming: Concepts, state of the art, and opportunities of macroscopic behaviour modelling, ACM Comput. Surv. 55 (2023) 275:1–275:37. URL: https://doi.org/10.1145/3579353. doi:10.1145/3579353. [13] S. Sharma, V. Chang, U. S. Tim, J. Wong, S. K. Gadia, Cloud and iot-based emerging services systems, Clust. Comput. 22 (2019) 71–91. URL: https://doi.org/10.1007/s10586-018-2821-8. doi:10.1007/S10586-018-2821-8. [14] A. Taherkordi, F. Eliassen, Towards independent in-cloud evolution of cyber-physical systems, in: 2014 IEEE International Conference on Cyber-Physical Systems, Networks, and Applications, CPSNA 2014, Hong Kong, China, August 25-26, 2014, IEEE Computer Society, 2014, pp. 19–24. URL: https://doi.org/10.1109/CPSNA.2014.12. doi:10.1109/CPSNA.2014. 12. [15] W. Z. Khan, E. Ahmed, S. Hakak, I. Yaqoob, A. Ahmed, Edge computing: A survey, Future Gener. Comput. Syst. 97 (2019) 219–235. URL: https://doi.org/10.1016/j.future.2019.02.050. doi:10.1016/J.FUTURE.2019.02.050. [16] R. Newton, M. Welsh, Region streams: functional macroprogramming for sensor networks, in: A. Labrinidis, S. Madden (Eds.), Proceedings of the 1st Workshop on Data Management for Sensor Networks, in conjunction with VLDB, DMSN 2004, Toronto, Canada, August 30, 2004, volume 72 of ACM International Conference Proceeding Series, ACM, 2004, pp. 78–87. URL: https://doi.org/10.1145/1052199.1052213. doi:10.1145/1052199.1052213. [17] R. Casadei, G. Fortino, D. Pianini, W. Russo, C. Savaglio, M. Viroli, A development approach for collective opportunistic edge-of-things services, Inf. Sci. 498 (2019) 154–169. URL: https://doi.org/10.1016/j.ins.2019.05.058. doi:10.1016/J.INS.2019.05.058. [18] A. Azzara, D. Alessandrelli, S. Bocchino, M. Petracca, P. Pagano, Pyot, a macroprogramming framework for the internet of things, in: Proceedings of the 9th IEEE International Symposium on Industrial Embedded Systems, SIES 2014, Pisa, Italy, June 18-20, 2014, IEEE, 2014, pp. 96–103. URL: https://doi.org/10.1109/SIES.2014.6871193. doi:10.1109/ SIES.2014.6871193. [19] C. Pinciroli, G. Beltrame, Buzz: A programming language for robot swarms, IEEE Softw. 33 (2016) 97–100. URL: https://doi.org/10.1109/MS.2016.95. doi:10.1109/MS.2016.95. [20] J. Beal, D. Pianini, M. Viroli, Aggregate programming for the internet of things, Computer 48 (2015) 22–30. URL: https://doi.org/10.1109/MC.2015.261. doi:10.1109/MC.2015.261. [21] G. Aguzzi, M. Viroli, L. Esterle, Field-informed reinforcement learning of collective tasks with graph neural networks, in: IEEE International Conference on Autonomic Computing and Self-Organizing Systems, ACSOS 2023, Toronto, ON, Canada, September 25-29, 2023, IEEE, 2023, pp. 37–46. URL: https://doi.org/10.1109/ACSOS58161.2023.00021. doi:10.1109/ACSOS58161.2023.00021. [22] G. Aguzzi, R. Casadei, M. Viroli, Towards reinforcement learning-based aggregate com- puting, in: M. H. ter Beek, M. Sirjani (Eds.), Coordination Models and Languages - 24th IFIP WG 6.1 International Conference, COORDINATION 2022, Held as Part of the 17th International Federated Conference on Distributed Computing Techniques, DisCoTec 2022, Lucca, Italy, June 13-17, 2022, Proceedings, volume 13271 of Lecture Notes in Com- puter Science, Springer, 2022, pp. 72–91. URL: https://doi.org/10.1007/978-3-031-08143-9_5. doi:10.1007/978-3-031-08143-9\_5. [23] G. Aguzzi, R. Casadei, M. Viroli, Addressing collective computations efficiency: To- wards a platform-level reinforcement learning approach, in: R. Casadei, E. D. Nitto, I. Gerostathopoulos, D. Pianini, I. Dusparic, T. Wood, P. R. Nelson, E. Pournaras, N. Ben- como, S. Götz, C. Krupitzer, C. Raibulet (Eds.), IEEE International Conference on Autonomic Computing and Self-Organizing Systems, ACSOS 2022, Virtual, CA, USA, September 19- 23, 2022, IEEE, 2022, pp. 11–20. URL: https://doi.org/10.1109/ACSOS55765.2022.00019. doi:10.1109/ACSOS55765.2022.00019. [24] G. Ananthanarayanan, P. Bahl, P. Bodík, K. Chintalapudi, M. Philipose, L. Ravindranath, S. Sinha, Real-time video analytics: The killer app for edge computing, Computer 50 (2017) 58–67. URL: https://doi.org/10.1109/MC.2017.3641638. doi:10.1109/MC.2017.3641638. [25] G. Kar, S. Jain, M. Gruteser, F. Bai, R. Govindan, Real-time traffic estimation at vehicular edge nodes, in: J. Zhang, M. Chiang, B. M. Maggs (Eds.), Proceedings of the Second ACM/IEEE Symposium on Edge Computing, San Jose / Silicon Valley, SEC 2017, CA, USA, October 12-14, 2017, ACM, 2017, pp. 3:1–3:13. URL: https://doi.org/10.1145/3132211.3134461. doi:10.1145/3132211.3134461. [26] S. Tuli, N. Basumatary, R. Buyya, Edgelens: Deep learning based object detection in integrated iot, fog and cloud computing environments, CoRR abs/1906.11056 (2019). URL: http://arxiv.org/abs/1906.11056. arXiv:1906.11056. [27] P. J. Navarro, C. Fernández-Isla, R. Borraz, D. Alonso, A machine learning approach to pedestrian detection for autonomous vehicles using high-definition 3d range data, Sensors 17 (2017) 18. URL: https://doi.org/10.3390/s17010018. doi:10.3390/S17010018. [28] X. Chang, W. Li, C. Xia, J. Ma, J. Cao, S. U. Khan, A. Y. Zomaya, From insight to impact: Building a sustainable edge computing platform for smart homes, in: 24th IEEE Interna- tional Conference on Parallel and Distributed Systems, ICPADS 2018, Singapore, December 11-13, 2018, IEEE, 2018, pp. 928–936. URL: https://doi.org/10.1109/PADSW.2018.8644647. doi:10.1109/PADSW.2018.8644647. [29] I. Kovacevic, E. Harjula, S. Glisic, B. Lorenzo, M. Ylianttila, Cloud and edge computation offloading for latency limited services, IEEE Access 9 (2021) 55764–55776. URL: https: //doi.org/10.1109/ACCESS.2021.3071848. doi:10.1109/ACCESS.2021.3071848. [30] J. Kwak, Y. Kim, J. Lee, S. Chong, DREAM: dynamic resource and task allocation for energy minimization in mobile cloud systems, IEEE J. Sel. Areas Commun. 33 (2015) 2510–2523. URL: https://doi.org/10.1109/JSAC.2015.2478718. doi:10.1109/JSAC.2015.2478718. [31] S. Wang, M. Zafer, K. K. Leung, Online placement of multi-component applications in edge computing environments, IEEE Access 5 (2017) 2514–2533. URL: https://doi.org/10. 1109/ACCESS.2017.2665971. doi:10.1109/ACCESS.2017.2665971. [32] X. Hou, C. Li, J. Liu, L. Zhang, Y. Hu, M. Guo, Ant-man: towards agile power management in the microservice era, in: C. Cuicchi, I. Qualters, W. T. Kramer (Eds.), Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020, Virtual Event / Atlanta, Georgia, USA, November 9-19, 2020, IEEE/ACM, 2020, p. 78. URL: https://doi.org/10.1109/SC41405.2020.00082. doi:10.1109/ SC41405.2020.00082. [33] Y. Gan, Y. Zhang, K. Hu, D. Cheng, Y. He, M. Pancholi, C. Delimitrou, Seer: Leveraging big data to navigate the complexity of performance debugging in cloud microservices, in: I. Bahar, M. Herlihy, E. Witchel, A. R. Lebeck (Eds.), Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, Providence, RI, USA, April 13-17, 2019, ACM, 2019, pp. 19–33. URL: https://doi.org/10.1145/3297858.3304004. doi:10.1145/3297858.3304004. [34] Q. Chen, H. Yang, M. Guo, R. S. Kannan, J. Mars, L. Tang, Prophet: Precise qos prediction on non-preemptive accelerators to improve utilization in warehouse-scale computers, in: Y. Chen, O. Temam, J. Carter (Eds.), Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2017, Xi’an, China, April 8-12, 2017, ACM, 2017, pp. 17–32. URL: https://doi.org/ 10.1145/3037697.3037700. doi:10.1145/3037697.3037700. [35] K. Fu, W. Zhang, Q. Chen, D. Zeng, M. Guo, Adaptive resource efficient microservice deployment in cloud-edge continuum, IEEE Trans. Parallel Distributed Syst. 33 (2022) 1825–1840. URL: https://doi.org/10.1109/TPDS.2021.3128037. doi:10.1109/TPDS.2021. 3128037. [36] Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, N. de Freitas, Dueling network architectures for deep reinforcement learning, in: M. Balcan, K. Q. Weinberger (Eds.), Pro- ceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, volume 48 of JMLR Workshop and Conference Proceedings, JMLR.org, 2016, pp. 1995–2003. URL: http://proceedings.mlr.press/v48/wangf16.html. [37] G. Nieto, I. de la Iglesia, U. Lopez-Novoa, C. Perfecto, Deep reinforcement learning techniques for dynamic task offloading in the 5g edge-cloud continuum, J. Cloud Comput. 13 (2024) 94. URL: https://doi.org/10.1186/s13677-024-00658-0. doi:10.1186/ S13677-024-00658-0. [38] S. Sheng, P. Chen, Z. Chen, L. Wu, Y. Yao, Deep reinforcement learning-based task scheduling in iot edge computing, Sensors 21 (2021) 1666. URL: https://doi.org/10.3390/ s21051666. doi:10.3390/S21051666. [39] Y. Wei, Z. Zhang, F. R. Yu, Z. Han, Joint user scheduling and content caching strategy for mobile edge networks using deep reinforcement learning, in: 2018 IEEE International Conference on Communications Workshops, ICC Workshops 2018, Kansas City, MO, USA, May 20-24, 2018, IEEE, 2018, pp. 1–6. URL: https://doi.org/10.1109/ICCW.2018.8403711. doi:10.1109/ICCW.2018.8403711. [40] D. Zeng, L. Gu, S. Pan, J. Cai, S. Guo, Resource management at the network edge: A deep reinforcement learning approach, IEEE Netw. 33 (2019) 26–33. URL: https://doi.org/10. 1109/MNET.2019.1800386. doi:10.1109/MNET.2019.1800386. [41] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, P. S. Yu, A comprehensive survey on graph neural networks, IEEE Trans. Neural Networks Learn. Syst. 32 (2021) 4–24. URL: https: //doi.org/10.1109/TNNLS.2020.2978386. doi:10.1109/TNNLS.2020.2978386. [42] X. Yu, W. Wu, P. Feng, Y. Tian, Swarm inverse reinforcement learning for biological systems, in: Y. Huang, L. A. Kurgan, F. Luo, X. Hu, Y. Chen, E. R. Dougherty, A. Kloczkowski, Y. Li (Eds.), IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021, Houston, TX, USA, December 9-12, 2021, IEEE, 2021, pp. 274–279. URL: https: //doi.org/10.1109/BIBM52615.2021.9669656. doi:10.1109/BIBM52615.2021.9669656. [43] E. I. Tolstaya, F. Gama, J. Paulos, G. J. Pappas, V. Kumar, A. Ribeiro, Learning decentralized controllers for robot swarms with graph neural networks, in: Proc. of the Conf. on Robot Learning, 2019, pp. 671–682. [44] E. Tolstaya, F. Gama, J. Paulos, G. Pappas, V. Kumar, A. Ribeiro, Learning decentralized controllers for robot swarms with graph neural networks, in: Proc. of the Conf. on Robot Learning, PMLR, 2020, pp. 671–682. [45] W. Gosrich, S. Mayya, R. Li, J. Paulos, M. Yim, A. Ribeiro, V. Kumar, Coverage control in multi-robot systems via graph neural networks, in: Proc. of the Int. Conf. on Robotics and Automation, IEEE, 2022, pp. 8787–8793. doi:10.1109/ICRA46639.2022.9811854. [46] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, G. E. Dahl, Neural message passing for quantum chemistry, in: Proc. of the 34th International Conference on Machine Learning, 2017, pp. 1263–1272. [47] P. K. Sharma, R. Fernandez, E. Zaroukian, M. Dorothy, A. Basak, D. E. Asher, Survey of recent multi-agent reinforcement learning algorithms utilizing centralized training, arXiv preprint arXiv:2107.14316 (2021). URL: https://arxiv.org/abs/2107.14316. [48] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. A. Riedmiller, Playing atari with deep reinforcement learning, CoRR abs/1312.5602 (2013). URL: http: //arxiv.org/abs/1312.5602. arXiv:1312.5602. [49] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy opti- mization algorithms, CoRR abs/1707.06347 (2017). URL: http://arxiv.org/abs/1707.06347. arXiv:1707.06347. [50] C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. M. Bayen, Y. Wu, The sur- prising effectiveness of PPO in cooperative multi-agent games, in: S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh (Eds.), Advances in Neu- ral Information Processing Systems 35: Annual Conference on Neural Informa- tion Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022. URL: http://papers.nips.cc/paper_files/paper/2022/hash/ 9c1535a02f0ce079433344e14d910597-Abstract-Datasets_and_Benchmarks.html. [51] Y. Yang, R. Luo, M. Li, M. Zhou, W. Zhang, J. Wang, Mean field multi-agent reinforcement learning, in: J. G. Dy, A. Krause (Eds.), Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, PMLR, 2018, pp. 5567–5576. URL: http://proceedings.mlr.press/v80/yang18d.html. [52] R. Casadei, D. Pianini, M. Viroli, A. Natali, Self-organising coordination regions: A pattern for edge computing, in: H. R. Nielson, E. Tuosto (Eds.), Coordination Models and Languages - 21st IFIP WG 6.1 International Conference, COORDINATION 2019, Held as Part of the 14th International Federated Conference on Distributed Computing Techniques, DisCoTec 2019, Kongens Lyngby, Denmark, June 17-21, 2019, Proceedings, volume 11533 of Lecture Notes in Computer Science, Springer, 2019, pp. 182–199. URL: https: //doi.org/10.1007/978-3-030-22397-7_11. doi:10.1007/978-3-030-22397-7\_11. [53] K. Khetarpal, M. Riemer, I. Rish, D. Precup, Towards continual reinforcement learning: A review and perspectives, J. Artif. Intell. Res. 75 (2022) 1401–1476. URL: https://doi.org/10. 1613/jair.1.13673. doi:10.1613/JAIR.1.13673. [54] Y. Zhang, W. Hua, Z. Zhou, G. E. Suh, C. Delimitrou, Sinan: Ml-based and qos-aware resource management for cloud microservices, in: T. Sherwood, E. D. Berger, C. Kozyrakis (Eds.), ASPLOS ’21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Virtual Event, USA, April 19-23, 2021, ACM, 2021, pp. 167–181. URL: https://doi.org/10.1145/3445814.3446693. doi:10.1145/ 3445814.3446693. [55] Y. Gan, M. Liang, S. Dev, D. Lo, C. Delimitrou, Sage: practical and scalable ml-driven performance debugging in microservices, in: T. Sherwood, E. D. Berger, C. Kozyrakis (Eds.), ASPLOS ’21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Virtual Event, USA, April 19-23, 2021, ACM, 2021, pp. 135–151. URL: https://doi.org/10.1145/3445814.3446700. doi:10.1145/ 3445814.3446700. [56] C. V. Goldman, J. S. Rosenschein, Emergent coordination through the use of cooperative state-changing rules, in: B. Hayes-Roth, R. E. Korf (Eds.), Proceedings of the 12th National Conference on Artificial Intelligence, Seattle, WA, USA, July 31 - August 4, 1994, Volume 1, AAAI Press / The MIT Press, 1994, pp. 408–413. URL: http://www.aaai.org/Library/AAAI/ 1994/aaai94-062.php. [57] C. V. Goldman, S. Zilberstein, Optimizing information exchange in cooperative multi- agent systems, in: The Second International Joint Conference on Autonomous Agents & Multiagent Systems, AAMAS 2003, July 14-18, 2003, Melbourne, Victoria, Australia, Proceedings, ACM, 2003, pp. 137–144. URL: https://doi.org/10.1145/860575.860598. doi:10. 1145/860575.860598. [58] C. Zhu, M. Dastani, S. Wang, A survey of multi-agent deep reinforcement learning with communication, Auton. Agents Multi Agent Syst. 38 (2024) 4. URL: https://doi.org/10.1007/ s10458-023-09633-6. doi:10.1007/S10458-023-09633-6. [59] J. N. Foerster, Y. M. Assael, N. de Freitas, S. Whiteson, Learning to communicate with deep multi-agent reinforcement learning, in: D. D. Lee, M. Sugiyama, U. von Luxburg, I. Guyon, R. Garnett (Eds.), Advances in Neural Information Processing Systems 29: An- nual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, 2016, pp. 2137–2145. URL: https://proceedings.neurips.cc/paper/2016/ hash/c7635bfd99248a2cdef8249ef7bfbef4-Abstract.html. [60] A. Agarwal, S. Kumar, K. P. Sycara, M. Lewis, Learning transferable cooperative behavior in multi-agent teams, in: A. E. F. Seghrouchni, G. Sukthankar, B. An, N. Yorke-Smith (Eds.), Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’20, Auckland, New Zealand, May 9-13, 2020, International Foundation for Autonomous Agents and Multiagent Systems, 2020, pp. 1741–1743. URL: https://dl.acm.org/doi/10.5555/3398761.3398967. doi:10.5555/3398761.3398967. [61] J. Jiang, C. Dun, T. Huang, Z. Lu, Graph convolutional reinforcement learning, in: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020, OpenReview.net, 2020. URL: https://openreview.net/forum?id= HkxdQkSYDB. [62] D. Pianini, R. Casadei, M. Viroli, S. Mariani, F. Zambonelli, Time-fluid field-based coordina- tion through programmable distributed schedulers, Log. Methods Comput. Sci. 17 (2021). URL: https://doi.org/10.46298/lmcs-17(4:13)2021. doi:10.46298/LMCS-17(4:13)2021. [63] C. Jian, Z. Pan, L. Bao, M. Zhang, Online-learning task scheduling with gnn-rl scheduler in collaborative edge computing, Cluster Computing 27 (2023) 589–605. URL: http://dx. doi.org/10.1007/s10586-022-03957-w. doi:10.1007/s10586-022-03957-w.