Towards a Domain-Specific Language for Automated Network Management Tim Molderez Coen De Roover Wolfgang De Meuter Software Languages Lab Software Languages Lab Software Languages Lab Vrije Universiteit Brussel Vrije Universiteit Brussel Vrije Universiteit Brussel Brussels, Belgium Brussels, Belgium Brussels, Belgium tim.molderez@vub.be coen.de.roover@vub.be wolfgang.de.meuter@vub.be Abstract—Software applications involving networks, in a broad sense of the term, are becoming more complex and are deployed on a growing number of devices. These applications can involve wireless sensor networks, smart grids, intelligent traffic light sys- tems, and so on. Manually managing such networks is becoming increasingly difficult. To automate this management process, this paper introduces the initial design of the Marlon domain-specific language. Marlon is suited to specify the desired management policies that should be achieved. It can automatically apply these policies using machine learning techniques, effectively reducing the amount of effort needed to manage such systems. Index Terms—domain-specific languages, multi-agent systems, Fig. 1. Overview of a simple smart grid system machine learning I. I NTRODUCTION automate a specific multi-agent system. For instance, Marlon is This work is situated in the context of software applications designed to easily switch from a simulated multi-agent system that are meant to be deployed on a network. We use the term to deployment in a real environment. It also is possible to network in a broad sense. While it includes the commonly used specify and combine multiple machine learning goals, without notion of computer networks, it also involves quite different depending on which specific machine learning algorithm is environments such as wireless sensor networks, power grids used. or traffic light systems. Hardware plays a large role in such Marlon is a DSL implemented on top of the Elixir1 lan- networked environments, but as there is growing need to guage. Elixir was chosen as the host language for three main make these environments “smart” (e.g. smart grids, intelligent reasons. First, it focuses on building distributed, fault-tolerant traffic light systems), software is necessary. This software is systems. Elixir leverages the Erlang VM, which has a proven becoming increasingly more complex, and it is deployed in track record of scaling to very large systems, used by services an environment that can potentially scale up to millions of such as Amazon and WhatsApp. Second, Elixir implements devices. As such, configuring and managing such systems, the actor concurrency model, where each actor/process is which is often done manually, does not come easy. isolated and can only communicate with other actors via mes- This paper introduces an initial version of Marlon (Multi- sages. This model coincides well with multi-agent systems, Agent Reinforcement Learning On Networks), a domain- such that each agent corresponds with an actor. Finally, we specific language (DSL) that aims to simplify automating this chose Elixir because it has been designed with extensibility management process. To achieve this goal, developers can use and domain-specific languages in mind. As such, a prototype Marlon to specify a number of policies or goals that need to be implementation of Marlon, which essentially consists of a set attained. As the DSL’s complete name implies, the automation of macros, could be developed in a short time frame. itself is done using reinforcement learning [3], a machine The remainder of this paper introduces Marlon by means learning technique. The network itself is represented in Marlon of an example in Sec. II, and we briefly describes its informal as a multi-agent system, a term from the domain of artificial semantics in Sec. III. intelligence. Each device in the network is then represented as an agent, which can be roughly defined as an entity that II. S MART GRID EXAMPLE can act autonomously. We chose to implement Marlon as a To illustrate the use of Marlon, we will discuss a small DSL, with the aim of reducing the development and main- example in this section. This example is situated in the tenance cost, compared to using general purpose-language to context of smart grids, i.e. an electrical grid where power Tim Molderez is supported by the FWO-SBO-SMILE-IT project, funded by the Research Foundation Flanders (FWO) 1 https://elixir-lang.org/ 39 usage/production is monitored with the aim of making a more state of the house it belongs to by raising its temperature and efficient use of the available energy. An overview of the energy consumption (lines 14-20). After executing the chosen example system is given in Fig. 1: it consists of a grid manager component behaviours, each agent executes its step function and multiple houses each having a central heating system. (lines 42-45) to make any further adjustments to its state. Once The role of the grid manager is to provide power to each this is done, the grid manager can update its global state, based house, and to keep track of the total power usage. Each heating on each of the house’s states. More specifically, on lines 5-9, system keeps track of how much power it consumes, and its the grid manager computes the total power consumption of current temperature. The only policy we want to deploy in all houses. On this is done, one iteration of the system has this example is that each house should reach and maintain its finished, and the next one can start. desired temperature.2 While walking through the code of this example, we have The entire Marlon source code that specifies how to simulate not explained much yet regarding how the machine learning al- this system is given in Fig. 2. It also is possible to use Marlon gorithm works. The algorithm we have currently implemented to deploy this system in a real environment, but this is not is a basic Q-learning [7] algorithm, in which a “Q-table” is discussed in more detail in this paper. Before examining the maintained to learn which action needs to be taken when code of Fig. 2 in more detail, it is important to note that multi- the system is in a given state. In this example, there are agent systems commonly are modeled as discrete systems, only two actions: turning the heating component on, or off. which is also reflected in the design of Marlon. It means Representing the system’s state is more complex: as the system that the execution of a multi-agent system corresponds to an can be in an infinite amount of different states, an abstraction infinite main loop, where each iteration computes the next state must be defined over the state in order to create a finite amount of the system, based on the state of the previous iteration. of abstract states. This abstraction is defined on lines 51-57, The code in Fig. 2 consists of four separate sections, each in which the current system state is mapped to either -1, 0 defining a different part of the smart grid; the defworld or 1. The 1 value represents an abstracted state where the statement on lines 1-10 specifies the grid manager; the temperature is too hot; -1 is too cold, and 0 is just right. defcomponent statement on lines 12-25 specifies the cen- Once the abstracted state space, and the list of possible actions tral heating system of a house; the defagent statement on is defined, we only need to specify the reward function that lines 27-46 represents the specification of a house. Finally, the computes a reward value for a given combination of current defgoal statement on lines 48-68 specifies the goal/policy abstracted state, and the action that is taken. This function that each house should reach its desired temperature. is defined in lines 58-66. This completes the specification of The code that initializes the entire system is the following: the ReachDesiredTemp goal. To illustrate the Q-learning House.create :h1 algorithm in action, Fig. 3 shows how the temperature of House.create :h2 one house (y-axis) changes per iteration (x-axis). The learning House.add_goal :h1, ReachDesiredTemp algorithm keeps increasing the heating system’s temperature, House.add_goal :h2, ReachDesiredTemp {:ok, world} = World.start_link() until it crosses the desired temperature (22 C) in iteration 20, World.set_behaviour world, GridManager after which the temperature remains fairly stable. (Note that World.add_agent world, :h1 the temperature slowly drops when the heating is turned off World.add_agent world, :h2 due to line 43.) This code snippet creates two houses, installs the III. M ARLON OVERVIEW ReachDesiredTemp policy in each house, initializes the grid manager and adds the two houses to it. After illustrating Marlon with an example, we can now We can now examine the code in Fig. 2 in some more describe the language’s concepts and informal semantics in detail. At each iteration of the simulation, the machine learning general terms. algorithm must first make a decision, based on the goals The four main concepts used in the language are: world, that have been specified. In our case, there only is the agents, components and goals. ReachDesiredTemp goal, installed on both houses. Line World - A Marlon multi-agent system has one ”world”, 49 states that the goal should choose between the behaviours an actor that maintains any global state in the system, of the centralheating component of a house, defined which is shared with all agents. The input_data and on line 33. The CentralHeating component itself (lines 12- output_data fields (lines 2-9 in Fig. 2) respectively define 25), has two behaviours: on or off. Let us assume that the which data the world receives from its agents, and which parts machine learning algorithm has currently decided to choose of its state are published to all agents. the ”on” behaviour in both houses. These chosen behaviours Agent - An agent corresponds to an actor. The fields are now executed: the CentralHeating component updates the field (line 28) specifies an agent’s internal state. The components field lists which components are contained 2 We chose this policy only for its simplicity. Marlon uses machine learning by this agent. The input_data and output_data fields to apply this policy, but there are simpler methods to implement a thermostat. respectively define which data the agent receives from the The use of machine learning can be demonstrated in more complex examples, such as a grid where energy is traded between houses, and the optimal selling world, and which parts of its state are published to the world. price is learned. This is part of future work. An agent also defines a ”step” function; this function is used 40 1 defworld GridManager, [ 2 input_data: [ 3 {:agents, :power_consumption, :as, :agents_power_consumption} 4 ], 5 output_data: [ 6 {:data, :power_consumption, fn (_global_state, knowledge) -> 7 knowledge[:agents_power_consumption] |> elem 1 |> Enum.sum 8 end} 9 ] 10 ] 11 12 defcomponent CentralHeating, [ 13 behaviour: [ 14 on: fn(component, _knowledge, agent_state) -> 15 agent_state = %{agent_state | 16 temperature: agent_state.temperature + 1, 17 power_consumption: agent_state.power_consumption + 100 18 } 19 {component, agent_state} 20 end, 21 off: fn(component, _knowledge, agent_state) -> 22 {component, agent_state} 23 end 24 ] 25 ] 26 27 defagent House, [ 28 fields: %{ 29 temperature: 5, 30 power_consumption: 0 31 }, 32 components: %{ 33 centralheating: CentralHeating 34 }, 35 input_data: [ 36 {:world, :power_consumption, :as, :world_power_consumption} 37 ], 38 output_data: [ 39 {:data, :power_consumption, 40 fn(_components, agent_state, _knowledge) -> agent_state[:power_consumption] end} 41 ], 42 step: fn(_identifier, components, knowledge, agent_state) -> 43 agent_state = %{agent_state | temperature: agent_state.temperature - 0.125} # Subtraction to account for colder outside temperature 44 {components, agent_state} 45 end 46 ] 47 48 defgoal ReachDesiredTemp, [ 49 components: [:centralheating], 50 attributes: %{target_temperature: 22}, 51 state_fields: [ 52 {:delta_temperature, [-1, 0, 1], fn(attributes, _knowledge, _components, agent_state) -> 53 %{temperature: temperature} = agent_state 54 %{target_temperature: target_temperature} = attributes 55 Utils.sign(temperature - target_temperature) # +1 = too hot, 0 = ok, -0 = too cold 56 end} 57 ], 58 reward: fn (attributes, _components, _old_components, _knowledge, _old_knowledge, agent_state, old_agent_state) -> 59 target_temperature = attributes.target_temperature 60 if (abs(agent_state.temperature - target_temperature) <= 1) do 61 10000 62 else 63 old_difference = abs(old_agent_state.temperature - target_temperature) 64 new_difference = abs(agent_state.temperature - target_temperature) 65 if (old_difference >= new_difference), do: 5, else: -500 66 end 67 end 68 ] Fig. 2. Marlon code of the example smart grid 41 1 step = 1 2 executeAndUpdate(step) 3 4 loop { 5 step++ 6 Action selection + execution 7 executeAndUpdate(step) 8 Learning reward is computed 9 } 10 11 def executeAndUpdate(int x) { 12 Agents publish output data 13 World updates input data 14 World and all agents execute step x 15 World publishes output data 16 Agents update input data 17 } Fig. 3. Temperature evolution of a single house Fig. 4. Pseudocode for the execution loop of a multi-agent system to compute the agents’ next state, based on its current state IV. R ELATED WORK and the world’s state. Regarding related work, there are several existing frame- Component - A component is part of an agent. It can works and domain-specific languages that cater to specific (optionally) have its own internal state. It only contains a types of multi-agent systems: number functions that define the possible behaviours of this For instance, Frenetic [2] and Nettle [6] focus on program- component. Only one of these functions is executed at each ming computer networks. TeenyLime [1], TinyDb [5] and iteration of the system. Which function will be executed is Semantic Streams [8] tackle querying and composing data in determined by the machine learning algorithm. 3 the area of wireless sensor networks. Whereas these papers do Goal - Finally, a goal specifies a desired property that not involve machine learning techniques to manage networks, an agent should reach, by means of a Q-learning algorithm. the work of Kara et al. [4] presents a learning-based framework The components field (line 49 in Fig. 2) determines which to automate smart grid management. While the example we components the machine learning algorithm can control. It is presented is also situated in a smart grid context, our aim for possible to attach multiple goals to the same component, but a Marlon is to focus on the more general domain of multi-agent weight function (not shown) should then be specified to deter- systems. mine which goal has the highest priority. The attributes field specifies any parameters that may be relevant to the goal. V. C ONCLUSION AND FUTURE WORK The state_fields field defines the abstract state space This paper has presented an initial version of Marlon, a used by the Q-learning algorithm, together with a function that DSL for automating the management of multi-agent systems. maps the current state to an abstracted state. Finally, there is The DSL was illustrated by means of an example in a smart the reward function that computes a reward value for the grid context. As this initial version of the language was also current state of the system, given the previous state. developed starting from this context, one direction of future As mentioned before, the multi-agent systems implemented work is to apply the language in other types of multi-agent with Marlon are discrete. The execution of such a system systems, and to evolve and extend the language with new corresponds to a loop where each iteration represents the features on an as-needed basis. We also need to evaluate system’s next state. The pseudocode in Fig. 4 gives a more the language in terms of its expressiveness, how it compares precise idea of what happens in each iteration: first, for each to frameworks/DSLs that focus on a specific domain, and goal, an action/behaviour is selected from the components it how effective Marlon it is at reaching its machine learning may affect. This selection is then executed. Next, all agents goals. Another direction of future work is to add support make their output data available to the world, which the world for collaboration among agents, so it becomes possible to uses to update its input data. After this, all agent execute their specify goals that span across groups of agents, rather than step function. Once this is done, the world publishes its output only specifying goals that apply to individual agents. data, and makes it available as the input data for all agents. The computation of the system’s new current state is now finished, R EFERENCES and all that remains is to use the reward function of each goal [1] Paolo Costa, Luca Mottola, Amy L Murphy, and Gian Pietro Picco. to compute how effective its chosen action was. Teenylime: transiently shared tuple space middleware for wireless sensor networks. In Proceedings of the international workshop on Middleware for sensor networks, pages 43–48. ACM, 2006. [2] Nate Foster, Rob Harrison, Michael J Freedman, Christopher Monsanto, 3 Alternatively, it also is possible to write your own function that chooses Jennifer Rexford, Alec Story, and David Walker. Frenetic: A network which behaviour is executed, rather than letting the machine learning algo- programming language. In ACM Sigplan Notices, volume 46, pages 279– rithm choose. 291. ACM, 2011. 42 [3] Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237–285, 1996. [4] Emre Can Kara, Mario Berges, Bruce Krogh, and Soummya Kar. Using smart devices for system-level management and control in the smart grid: A reinforcement learning framework. In Smart Grid Communications (SmartGridComm), 2012 IEEE Third International Conference on, pages 85–90. IEEE, 2012. [5] Samuel R Madden, Michael J Franklin, Joseph M Hellerstein, and Wei Hong. Tinydb: an acquisitional query processing system for sensor networks. ACM Transactions on database systems (TODS), 30(1):122– 173, 2005. [6] Andreas Voellmy and Paul Hudak. Nettle: Taking the sting out of pro- gramming network routers. Practical Aspects of Declarative Languages, pages 235–249, 2011. [7] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4):279–292, 1992. [8] Kamin Whitehouse, Feng Zhao, and Jie Liu. Semantic streams: A frame- work for composable semantic interpretation of sensor data. Wireless Sensor Networks, pages 5–20, 2006. 43