=Paper=
{{Paper
|id=Vol-2047/BENEVOL_2017_paper_11
|storemode=property
|title=Towards a Domain-Specific Language for Automated Network Management
|pdfUrl=https://ceur-ws.org/Vol-2047/BENEVOL_2017_paper_11.pdf
|volume=Vol-2047
|authors=Tim Molderez,Coen De Roover,Wolfgang De Meuter
|dblpUrl=https://dblp.org/rec/conf/benevol/MolderezRM17
}}
==Towards a Domain-Specific Language for Automated Network Management==
Towards a Domain-Specific Language for
Automated Network Management
Tim Molderez Coen De Roover Wolfgang De Meuter
Software Languages Lab Software Languages Lab Software Languages Lab
Vrije Universiteit Brussel Vrije Universiteit Brussel Vrije Universiteit Brussel
Brussels, Belgium Brussels, Belgium Brussels, Belgium
tim.molderez@vub.be coen.de.roover@vub.be wolfgang.de.meuter@vub.be
Abstract—Software applications involving networks, in a broad
sense of the term, are becoming more complex and are deployed
on a growing number of devices. These applications can involve
wireless sensor networks, smart grids, intelligent traffic light sys-
tems, and so on. Manually managing such networks is becoming
increasingly difficult. To automate this management process, this
paper introduces the initial design of the Marlon domain-specific
language. Marlon is suited to specify the desired management
policies that should be achieved. It can automatically apply these
policies using machine learning techniques, effectively reducing
the amount of effort needed to manage such systems.
Index Terms—domain-specific languages, multi-agent systems, Fig. 1. Overview of a simple smart grid system
machine learning
I. I NTRODUCTION automate a specific multi-agent system. For instance, Marlon is
This work is situated in the context of software applications designed to easily switch from a simulated multi-agent system
that are meant to be deployed on a network. We use the term to deployment in a real environment. It also is possible to
network in a broad sense. While it includes the commonly used specify and combine multiple machine learning goals, without
notion of computer networks, it also involves quite different depending on which specific machine learning algorithm is
environments such as wireless sensor networks, power grids used.
or traffic light systems. Hardware plays a large role in such Marlon is a DSL implemented on top of the Elixir1 lan-
networked environments, but as there is growing need to guage. Elixir was chosen as the host language for three main
make these environments “smart” (e.g. smart grids, intelligent reasons. First, it focuses on building distributed, fault-tolerant
traffic light systems), software is necessary. This software is systems. Elixir leverages the Erlang VM, which has a proven
becoming increasingly more complex, and it is deployed in track record of scaling to very large systems, used by services
an environment that can potentially scale up to millions of such as Amazon and WhatsApp. Second, Elixir implements
devices. As such, configuring and managing such systems, the actor concurrency model, where each actor/process is
which is often done manually, does not come easy. isolated and can only communicate with other actors via mes-
This paper introduces an initial version of Marlon (Multi- sages. This model coincides well with multi-agent systems,
Agent Reinforcement Learning On Networks), a domain- such that each agent corresponds with an actor. Finally, we
specific language (DSL) that aims to simplify automating this chose Elixir because it has been designed with extensibility
management process. To achieve this goal, developers can use and domain-specific languages in mind. As such, a prototype
Marlon to specify a number of policies or goals that need to be implementation of Marlon, which essentially consists of a set
attained. As the DSL’s complete name implies, the automation of macros, could be developed in a short time frame.
itself is done using reinforcement learning [3], a machine The remainder of this paper introduces Marlon by means
learning technique. The network itself is represented in Marlon of an example in Sec. II, and we briefly describes its informal
as a multi-agent system, a term from the domain of artificial semantics in Sec. III.
intelligence. Each device in the network is then represented
as an agent, which can be roughly defined as an entity that II. S MART GRID EXAMPLE
can act autonomously. We chose to implement Marlon as a To illustrate the use of Marlon, we will discuss a small
DSL, with the aim of reducing the development and main- example in this section. This example is situated in the
tenance cost, compared to using general purpose-language to context of smart grids, i.e. an electrical grid where power
Tim Molderez is supported by the FWO-SBO-SMILE-IT project, funded
by the Research Foundation Flanders (FWO) 1 https://elixir-lang.org/
39
usage/production is monitored with the aim of making a more state of the house it belongs to by raising its temperature and
efficient use of the available energy. An overview of the energy consumption (lines 14-20). After executing the chosen
example system is given in Fig. 1: it consists of a grid manager component behaviours, each agent executes its step function
and multiple houses each having a central heating system. (lines 42-45) to make any further adjustments to its state. Once
The role of the grid manager is to provide power to each this is done, the grid manager can update its global state, based
house, and to keep track of the total power usage. Each heating on each of the house’s states. More specifically, on lines 5-9,
system keeps track of how much power it consumes, and its the grid manager computes the total power consumption of
current temperature. The only policy we want to deploy in all houses. On this is done, one iteration of the system has
this example is that each house should reach and maintain its finished, and the next one can start.
desired temperature.2 While walking through the code of this example, we have
The entire Marlon source code that specifies how to simulate not explained much yet regarding how the machine learning al-
this system is given in Fig. 2. It also is possible to use Marlon gorithm works. The algorithm we have currently implemented
to deploy this system in a real environment, but this is not is a basic Q-learning [7] algorithm, in which a “Q-table” is
discussed in more detail in this paper. Before examining the maintained to learn which action needs to be taken when
code of Fig. 2 in more detail, it is important to note that multi- the system is in a given state. In this example, there are
agent systems commonly are modeled as discrete systems, only two actions: turning the heating component on, or off.
which is also reflected in the design of Marlon. It means Representing the system’s state is more complex: as the system
that the execution of a multi-agent system corresponds to an can be in an infinite amount of different states, an abstraction
infinite main loop, where each iteration computes the next state must be defined over the state in order to create a finite amount
of the system, based on the state of the previous iteration. of abstract states. This abstraction is defined on lines 51-57,
The code in Fig. 2 consists of four separate sections, each in which the current system state is mapped to either -1, 0
defining a different part of the smart grid; the defworld or 1. The 1 value represents an abstracted state where the
statement on lines 1-10 specifies the grid manager; the temperature is too hot; -1 is too cold, and 0 is just right.
defcomponent statement on lines 12-25 specifies the cen- Once the abstracted state space, and the list of possible actions
tral heating system of a house; the defagent statement on is defined, we only need to specify the reward function that
lines 27-46 represents the specification of a house. Finally, the computes a reward value for a given combination of current
defgoal statement on lines 48-68 specifies the goal/policy abstracted state, and the action that is taken. This function
that each house should reach its desired temperature. is defined in lines 58-66. This completes the specification of
The code that initializes the entire system is the following: the ReachDesiredTemp goal. To illustrate the Q-learning
House.create :h1
algorithm in action, Fig. 3 shows how the temperature of
House.create :h2 one house (y-axis) changes per iteration (x-axis). The learning
House.add_goal :h1, ReachDesiredTemp algorithm keeps increasing the heating system’s temperature,
House.add_goal :h2, ReachDesiredTemp
{:ok, world} = World.start_link()
until it crosses the desired temperature (22 C) in iteration 20,
World.set_behaviour world, GridManager after which the temperature remains fairly stable. (Note that
World.add_agent world, :h1 the temperature slowly drops when the heating is turned off
World.add_agent world, :h2
due to line 43.)
This code snippet creates two houses, installs the
III. M ARLON OVERVIEW
ReachDesiredTemp policy in each house, initializes the
grid manager and adds the two houses to it. After illustrating Marlon with an example, we can now
We can now examine the code in Fig. 2 in some more describe the language’s concepts and informal semantics in
detail. At each iteration of the simulation, the machine learning general terms.
algorithm must first make a decision, based on the goals The four main concepts used in the language are: world,
that have been specified. In our case, there only is the agents, components and goals.
ReachDesiredTemp goal, installed on both houses. Line World - A Marlon multi-agent system has one ”world”,
49 states that the goal should choose between the behaviours an actor that maintains any global state in the system,
of the centralheating component of a house, defined which is shared with all agents. The input_data and
on line 33. The CentralHeating component itself (lines 12- output_data fields (lines 2-9 in Fig. 2) respectively define
25), has two behaviours: on or off. Let us assume that the which data the world receives from its agents, and which parts
machine learning algorithm has currently decided to choose of its state are published to all agents.
the ”on” behaviour in both houses. These chosen behaviours Agent - An agent corresponds to an actor. The fields
are now executed: the CentralHeating component updates the field (line 28) specifies an agent’s internal state. The
components field lists which components are contained
2 We chose this policy only for its simplicity. Marlon uses machine learning by this agent. The input_data and output_data fields
to apply this policy, but there are simpler methods to implement a thermostat. respectively define which data the agent receives from the
The use of machine learning can be demonstrated in more complex examples,
such as a grid where energy is traded between houses, and the optimal selling world, and which parts of its state are published to the world.
price is learned. This is part of future work. An agent also defines a ”step” function; this function is used
40
1 defworld GridManager, [
2 input_data: [
3 {:agents, :power_consumption, :as, :agents_power_consumption}
4 ],
5 output_data: [
6 {:data, :power_consumption, fn (_global_state, knowledge) ->
7 knowledge[:agents_power_consumption] |> elem 1 |> Enum.sum
8 end}
9 ]
10 ]
11
12 defcomponent CentralHeating, [
13 behaviour: [
14 on: fn(component, _knowledge, agent_state) ->
15 agent_state = %{agent_state |
16 temperature: agent_state.temperature + 1,
17 power_consumption: agent_state.power_consumption + 100
18 }
19 {component, agent_state}
20 end,
21 off: fn(component, _knowledge, agent_state) ->
22 {component, agent_state}
23 end
24 ]
25 ]
26
27 defagent House, [
28 fields: %{
29 temperature: 5,
30 power_consumption: 0
31 },
32 components: %{
33 centralheating: CentralHeating
34 },
35 input_data: [
36 {:world, :power_consumption, :as, :world_power_consumption}
37 ],
38 output_data: [
39 {:data, :power_consumption,
40 fn(_components, agent_state, _knowledge) -> agent_state[:power_consumption] end}
41 ],
42 step: fn(_identifier, components, knowledge, agent_state) ->
43 agent_state = %{agent_state | temperature: agent_state.temperature - 0.125} # Subtraction to account
for colder outside temperature
44 {components, agent_state}
45 end
46 ]
47
48 defgoal ReachDesiredTemp, [
49 components: [:centralheating],
50 attributes: %{target_temperature: 22},
51 state_fields: [
52 {:delta_temperature, [-1, 0, 1], fn(attributes, _knowledge, _components, agent_state) ->
53 %{temperature: temperature} = agent_state
54 %{target_temperature: target_temperature} = attributes
55 Utils.sign(temperature - target_temperature) # +1 = too hot, 0 = ok, -0 = too cold
56 end}
57 ],
58 reward: fn (attributes, _components, _old_components, _knowledge, _old_knowledge, agent_state,
old_agent_state) ->
59 target_temperature = attributes.target_temperature
60 if (abs(agent_state.temperature - target_temperature) <= 1) do
61 10000
62 else
63 old_difference = abs(old_agent_state.temperature - target_temperature)
64 new_difference = abs(agent_state.temperature - target_temperature)
65 if (old_difference >= new_difference), do: 5, else: -500
66 end
67 end
68 ]
Fig. 2. Marlon code of the example smart grid
41
1 step = 1
2 executeAndUpdate(step)
3
4 loop {
5 step++
6 Action selection + execution
7 executeAndUpdate(step)
8 Learning reward is computed
9 }
10
11 def executeAndUpdate(int x) {
12 Agents publish output data
13 World updates input data
14 World and all agents execute step x
15 World publishes output data
16 Agents update input data
17 }
Fig. 3. Temperature evolution of a single house
Fig. 4. Pseudocode for the execution loop of a multi-agent system
to compute the agents’ next state, based on its current state IV. R ELATED WORK
and the world’s state.
Regarding related work, there are several existing frame-
Component - A component is part of an agent. It can works and domain-specific languages that cater to specific
(optionally) have its own internal state. It only contains a types of multi-agent systems:
number functions that define the possible behaviours of this For instance, Frenetic [2] and Nettle [6] focus on program-
component. Only one of these functions is executed at each ming computer networks. TeenyLime [1], TinyDb [5] and
iteration of the system. Which function will be executed is Semantic Streams [8] tackle querying and composing data in
determined by the machine learning algorithm. 3 the area of wireless sensor networks. Whereas these papers do
Goal - Finally, a goal specifies a desired property that not involve machine learning techniques to manage networks,
an agent should reach, by means of a Q-learning algorithm. the work of Kara et al. [4] presents a learning-based framework
The components field (line 49 in Fig. 2) determines which to automate smart grid management. While the example we
components the machine learning algorithm can control. It is presented is also situated in a smart grid context, our aim for
possible to attach multiple goals to the same component, but a Marlon is to focus on the more general domain of multi-agent
weight function (not shown) should then be specified to deter- systems.
mine which goal has the highest priority. The attributes
field specifies any parameters that may be relevant to the goal. V. C ONCLUSION AND FUTURE WORK
The state_fields field defines the abstract state space This paper has presented an initial version of Marlon, a
used by the Q-learning algorithm, together with a function that DSL for automating the management of multi-agent systems.
maps the current state to an abstracted state. Finally, there is The DSL was illustrated by means of an example in a smart
the reward function that computes a reward value for the grid context. As this initial version of the language was also
current state of the system, given the previous state. developed starting from this context, one direction of future
As mentioned before, the multi-agent systems implemented work is to apply the language in other types of multi-agent
with Marlon are discrete. The execution of such a system systems, and to evolve and extend the language with new
corresponds to a loop where each iteration represents the features on an as-needed basis. We also need to evaluate
system’s next state. The pseudocode in Fig. 4 gives a more the language in terms of its expressiveness, how it compares
precise idea of what happens in each iteration: first, for each to frameworks/DSLs that focus on a specific domain, and
goal, an action/behaviour is selected from the components it how effective Marlon it is at reaching its machine learning
may affect. This selection is then executed. Next, all agents goals. Another direction of future work is to add support
make their output data available to the world, which the world for collaboration among agents, so it becomes possible to
uses to update its input data. After this, all agent execute their specify goals that span across groups of agents, rather than
step function. Once this is done, the world publishes its output only specifying goals that apply to individual agents.
data, and makes it available as the input data for all agents. The
computation of the system’s new current state is now finished, R EFERENCES
and all that remains is to use the reward function of each goal [1] Paolo Costa, Luca Mottola, Amy L Murphy, and Gian Pietro Picco.
to compute how effective its chosen action was. Teenylime: transiently shared tuple space middleware for wireless sensor
networks. In Proceedings of the international workshop on Middleware
for sensor networks, pages 43–48. ACM, 2006.
[2] Nate Foster, Rob Harrison, Michael J Freedman, Christopher Monsanto,
3 Alternatively, it also is possible to write your own function that chooses Jennifer Rexford, Alec Story, and David Walker. Frenetic: A network
which behaviour is executed, rather than letting the machine learning algo- programming language. In ACM Sigplan Notices, volume 46, pages 279–
rithm choose. 291. ACM, 2011.
42
[3] Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore.
Reinforcement learning: A survey. Journal of artificial intelligence
research, 4:237–285, 1996.
[4] Emre Can Kara, Mario Berges, Bruce Krogh, and Soummya Kar. Using
smart devices for system-level management and control in the smart grid:
A reinforcement learning framework. In Smart Grid Communications
(SmartGridComm), 2012 IEEE Third International Conference on, pages
85–90. IEEE, 2012.
[5] Samuel R Madden, Michael J Franklin, Joseph M Hellerstein, and Wei
Hong. Tinydb: an acquisitional query processing system for sensor
networks. ACM Transactions on database systems (TODS), 30(1):122–
173, 2005.
[6] Andreas Voellmy and Paul Hudak. Nettle: Taking the sting out of pro-
gramming network routers. Practical Aspects of Declarative Languages,
pages 235–249, 2011.
[7] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine
learning, 8(3-4):279–292, 1992.
[8] Kamin Whitehouse, Feng Zhao, and Jie Liu. Semantic streams: A frame-
work for composable semantic interpretation of sensor data. Wireless
Sensor Networks, pages 5–20, 2006.
43