Towards a Domain-Specific Language for
                 Automated Network Management
                  Tim Molderez                            Coen De Roover                                 Wolfgang De Meuter
             Software Languages Lab                   Software Languages Lab                            Software Languages Lab
             Vrije Universiteit Brussel               Vrije Universiteit Brussel                        Vrije Universiteit Brussel
                 Brussels, Belgium                        Brussels, Belgium                                 Brussels, Belgium
               tim.molderez@vub.be                     coen.de.roover@vub.be                           wolfgang.de.meuter@vub.be


   Abstract—Software applications involving networks, in a broad
sense of the term, are becoming more complex and are deployed
on a growing number of devices. These applications can involve
wireless sensor networks, smart grids, intelligent traffic light sys-
tems, and so on. Manually managing such networks is becoming
increasingly difficult. To automate this management process, this
paper introduces the initial design of the Marlon domain-specific
language. Marlon is suited to specify the desired management
policies that should be achieved. It can automatically apply these
policies using machine learning techniques, effectively reducing
the amount of effort needed to manage such systems.
   Index Terms—domain-specific languages, multi-agent systems,                         Fig. 1. Overview of a simple smart grid system
machine learning

                       I. I NTRODUCTION                                 automate a specific multi-agent system. For instance, Marlon is
   This work is situated in the context of software applications        designed to easily switch from a simulated multi-agent system
that are meant to be deployed on a network. We use the term             to deployment in a real environment. It also is possible to
network in a broad sense. While it includes the commonly used           specify and combine multiple machine learning goals, without
notion of computer networks, it also involves quite different           depending on which specific machine learning algorithm is
environments such as wireless sensor networks, power grids              used.
or traffic light systems. Hardware plays a large role in such              Marlon is a DSL implemented on top of the Elixir1 lan-
networked environments, but as there is growing need to                 guage. Elixir was chosen as the host language for three main
make these environments “smart” (e.g. smart grids, intelligent          reasons. First, it focuses on building distributed, fault-tolerant
traffic light systems), software is necessary. This software is         systems. Elixir leverages the Erlang VM, which has a proven
becoming increasingly more complex, and it is deployed in               track record of scaling to very large systems, used by services
an environment that can potentially scale up to millions of             such as Amazon and WhatsApp. Second, Elixir implements
devices. As such, configuring and managing such systems,                the actor concurrency model, where each actor/process is
which is often done manually, does not come easy.                       isolated and can only communicate with other actors via mes-
   This paper introduces an initial version of Marlon (Multi-           sages. This model coincides well with multi-agent systems,
Agent Reinforcement Learning On Networks), a domain-                    such that each agent corresponds with an actor. Finally, we
specific language (DSL) that aims to simplify automating this           chose Elixir because it has been designed with extensibility
management process. To achieve this goal, developers can use            and domain-specific languages in mind. As such, a prototype
Marlon to specify a number of policies or goals that need to be         implementation of Marlon, which essentially consists of a set
attained. As the DSL’s complete name implies, the automation            of macros, could be developed in a short time frame.
itself is done using reinforcement learning [3], a machine                 The remainder of this paper introduces Marlon by means
learning technique. The network itself is represented in Marlon         of an example in Sec. II, and we briefly describes its informal
as a multi-agent system, a term from the domain of artificial           semantics in Sec. III.
intelligence. Each device in the network is then represented
as an agent, which can be roughly defined as an entity that                                    II. S MART GRID EXAMPLE
can act autonomously. We chose to implement Marlon as a                   To illustrate the use of Marlon, we will discuss a small
DSL, with the aim of reducing the development and main-                 example in this section. This example is situated in the
tenance cost, compared to using general purpose-language to             context of smart grids, i.e. an electrical grid where power
  Tim Molderez is supported by the FWO-SBO-SMILE-IT project, funded
by the Research Foundation Flanders (FWO)                                 1 https://elixir-lang.org/


                                                                                                                                        39
usage/production is monitored with the aim of making a more                       state of the house it belongs to by raising its temperature and
efficient use of the available energy. An overview of the                         energy consumption (lines 14-20). After executing the chosen
example system is given in Fig. 1: it consists of a grid manager                  component behaviours, each agent executes its step function
and multiple houses each having a central heating system.                         (lines 42-45) to make any further adjustments to its state. Once
The role of the grid manager is to provide power to each                          this is done, the grid manager can update its global state, based
house, and to keep track of the total power usage. Each heating                   on each of the house’s states. More specifically, on lines 5-9,
system keeps track of how much power it consumes, and its                         the grid manager computes the total power consumption of
current temperature. The only policy we want to deploy in                         all houses. On this is done, one iteration of the system has
this example is that each house should reach and maintain its                     finished, and the next one can start.
desired temperature.2                                                                While walking through the code of this example, we have
   The entire Marlon source code that specifies how to simulate                   not explained much yet regarding how the machine learning al-
this system is given in Fig. 2. It also is possible to use Marlon                 gorithm works. The algorithm we have currently implemented
to deploy this system in a real environment, but this is not                      is a basic Q-learning [7] algorithm, in which a “Q-table” is
discussed in more detail in this paper. Before examining the                      maintained to learn which action needs to be taken when
code of Fig. 2 in more detail, it is important to note that multi-                the system is in a given state. In this example, there are
agent systems commonly are modeled as discrete systems,                           only two actions: turning the heating component on, or off.
which is also reflected in the design of Marlon. It means                         Representing the system’s state is more complex: as the system
that the execution of a multi-agent system corresponds to an                      can be in an infinite amount of different states, an abstraction
infinite main loop, where each iteration computes the next state                  must be defined over the state in order to create a finite amount
of the system, based on the state of the previous iteration.                      of abstract states. This abstraction is defined on lines 51-57,
   The code in Fig. 2 consists of four separate sections, each                    in which the current system state is mapped to either -1, 0
defining a different part of the smart grid; the defworld                         or 1. The 1 value represents an abstracted state where the
statement on lines 1-10 specifies the grid manager; the                           temperature is too hot; -1 is too cold, and 0 is just right.
defcomponent statement on lines 12-25 specifies the cen-                          Once the abstracted state space, and the list of possible actions
tral heating system of a house; the defagent statement on                         is defined, we only need to specify the reward function that
lines 27-46 represents the specification of a house. Finally, the                 computes a reward value for a given combination of current
defgoal statement on lines 48-68 specifies the goal/policy                        abstracted state, and the action that is taken. This function
that each house should reach its desired temperature.                             is defined in lines 58-66. This completes the specification of
   The code that initializes the entire system is the following:                  the ReachDesiredTemp goal. To illustrate the Q-learning
House.create :h1
                                                                                  algorithm in action, Fig. 3 shows how the temperature of
House.create :h2                                                                  one house (y-axis) changes per iteration (x-axis). The learning
House.add_goal :h1, ReachDesiredTemp                                              algorithm keeps increasing the heating system’s temperature,
House.add_goal :h2, ReachDesiredTemp
{:ok, world} = World.start_link()
                                                                                  until it crosses the desired temperature (22 C) in iteration 20,
World.set_behaviour world, GridManager                                            after which the temperature remains fairly stable. (Note that
World.add_agent world, :h1                                                        the temperature slowly drops when the heating is turned off
World.add_agent world, :h2
                                                                                  due to line 43.)
   This code snippet creates two houses, installs the
                                                                                                    III. M ARLON OVERVIEW
ReachDesiredTemp policy in each house, initializes the
grid manager and adds the two houses to it.                                          After illustrating Marlon with an example, we can now
   We can now examine the code in Fig. 2 in some more                             describe the language’s concepts and informal semantics in
detail. At each iteration of the simulation, the machine learning                 general terms.
algorithm must first make a decision, based on the goals                             The four main concepts used in the language are: world,
that have been specified. In our case, there only is the                          agents, components and goals.
ReachDesiredTemp goal, installed on both houses. Line                                World - A Marlon multi-agent system has one ”world”,
49 states that the goal should choose between the behaviours                      an actor that maintains any global state in the system,
of the centralheating component of a house, defined                               which is shared with all agents. The input_data and
on line 33. The CentralHeating component itself (lines 12-                        output_data fields (lines 2-9 in Fig. 2) respectively define
25), has two behaviours: on or off. Let us assume that the                        which data the world receives from its agents, and which parts
machine learning algorithm has currently decided to choose                        of its state are published to all agents.
the ”on” behaviour in both houses. These chosen behaviours                           Agent - An agent corresponds to an actor. The fields
are now executed: the CentralHeating component updates the                        field (line 28) specifies an agent’s internal state. The
                                                                                  components field lists which components are contained
   2 We chose this policy only for its simplicity. Marlon uses machine learning   by this agent. The input_data and output_data fields
to apply this policy, but there are simpler methods to implement a thermostat.    respectively define which data the agent receives from the
The use of machine learning can be demonstrated in more complex examples,
such as a grid where energy is traded between houses, and the optimal selling     world, and which parts of its state are published to the world.
price is learned. This is part of future work.                                    An agent also defines a ”step” function; this function is used


                                                                                                                                               40
 1   defworld GridManager, [
 2     input_data: [
 3       {:agents, :power_consumption, :as, :agents_power_consumption}
 4     ],
 5     output_data: [
 6       {:data, :power_consumption, fn (_global_state, knowledge) ->
 7         knowledge[:agents_power_consumption] |> elem 1 |> Enum.sum
 8       end}
 9     ]
10   ]
11
12   defcomponent CentralHeating, [
13     behaviour: [
14       on: fn(component, _knowledge, agent_state) ->
15         agent_state = %{agent_state |
16            temperature: agent_state.temperature + 1,
17            power_consumption: agent_state.power_consumption + 100
18         }
19         {component, agent_state}
20       end,
21       off: fn(component, _knowledge, agent_state) ->
22         {component, agent_state}
23       end
24     ]
25   ]
26
27   defagent House, [
28     fields: %{
29       temperature: 5,
30       power_consumption: 0
31     },
32     components: %{
33       centralheating: CentralHeating
34     },
35     input_data: [
36       {:world, :power_consumption, :as, :world_power_consumption}
37     ],
38     output_data: [
39       {:data, :power_consumption,
40        fn(_components, agent_state, _knowledge) -> agent_state[:power_consumption] end}
41     ],
42     step: fn(_identifier, components, knowledge, agent_state) ->
43       agent_state = %{agent_state | temperature: agent_state.temperature - 0.125} # Subtraction to account
             for colder outside temperature
44       {components, agent_state}
45     end
46   ]
47
48   defgoal ReachDesiredTemp, [
49     components: [:centralheating],
50     attributes: %{target_temperature: 22},
51     state_fields: [
52       {:delta_temperature, [-1, 0, 1], fn(attributes, _knowledge, _components, agent_state) ->
53         %{temperature: temperature} = agent_state
54         %{target_temperature: target_temperature} = attributes
55         Utils.sign(temperature - target_temperature) # +1 = too hot, 0 = ok, -0 = too cold
56       end}
57     ],
58     reward: fn (attributes, _components, _old_components, _knowledge, _old_knowledge, agent_state,
           old_agent_state) ->
59       target_temperature = attributes.target_temperature
60       if (abs(agent_state.temperature - target_temperature) <= 1) do
61         10000
62       else
63         old_difference = abs(old_agent_state.temperature - target_temperature)
64         new_difference = abs(agent_state.temperature     - target_temperature)
65         if (old_difference >= new_difference), do: 5, else: -500
66       end
67     end
68   ]


                                           Fig. 2. Marlon code of the example smart grid


                                                                                                                41
                                                                                    1     step = 1
                                                                                    2     executeAndUpdate(step)
                                                                                    3
                                                                                    4     loop {
                                                                                    5                step++
                                                                                    6                Action selection + execution
                                                                                    7                executeAndUpdate(step)
                                                                                    8                Learning reward is computed
                                                                                    9     }
                                                                                   10
                                                                                   11     def executeAndUpdate(int x) {
                                                                                   12             Agents publish output data
                                                                                   13             World updates input data
                                                                                   14             World and all agents execute step x
                                                                                   15             World publishes output data
                                                                                   16             Agents update input data
                                                                                   17     }
              Fig. 3. Temperature evolution of a single house
                                                                                        Fig. 4. Pseudocode for the execution loop of a multi-agent system


to compute the agents’ next state, based on its current state                                              IV. R ELATED WORK
and the world’s state.
                                                                                     Regarding related work, there are several existing frame-
   Component - A component is part of an agent. It can                            works and domain-specific languages that cater to specific
(optionally) have its own internal state. It only contains a                      types of multi-agent systems:
number functions that define the possible behaviours of this                         For instance, Frenetic [2] and Nettle [6] focus on program-
component. Only one of these functions is executed at each                        ming computer networks. TeenyLime [1], TinyDb [5] and
iteration of the system. Which function will be executed is                       Semantic Streams [8] tackle querying and composing data in
determined by the machine learning algorithm. 3                                   the area of wireless sensor networks. Whereas these papers do
   Goal - Finally, a goal specifies a desired property that                       not involve machine learning techniques to manage networks,
an agent should reach, by means of a Q-learning algorithm.                        the work of Kara et al. [4] presents a learning-based framework
The components field (line 49 in Fig. 2) determines which                         to automate smart grid management. While the example we
components the machine learning algorithm can control. It is                      presented is also situated in a smart grid context, our aim for
possible to attach multiple goals to the same component, but a                    Marlon is to focus on the more general domain of multi-agent
weight function (not shown) should then be specified to deter-                    systems.
mine which goal has the highest priority. The attributes
field specifies any parameters that may be relevant to the goal.                                 V. C ONCLUSION AND FUTURE WORK
The state_fields field defines the abstract state space                              This paper has presented an initial version of Marlon, a
used by the Q-learning algorithm, together with a function that                   DSL for automating the management of multi-agent systems.
maps the current state to an abstracted state. Finally, there is                  The DSL was illustrated by means of an example in a smart
the reward function that computes a reward value for the                          grid context. As this initial version of the language was also
current state of the system, given the previous state.                            developed starting from this context, one direction of future
   As mentioned before, the multi-agent systems implemented                       work is to apply the language in other types of multi-agent
with Marlon are discrete. The execution of such a system                          systems, and to evolve and extend the language with new
corresponds to a loop where each iteration represents the                         features on an as-needed basis. We also need to evaluate
system’s next state. The pseudocode in Fig. 4 gives a more                        the language in terms of its expressiveness, how it compares
precise idea of what happens in each iteration: first, for each                   to frameworks/DSLs that focus on a specific domain, and
goal, an action/behaviour is selected from the components it                      how effective Marlon it is at reaching its machine learning
may affect. This selection is then executed. Next, all agents                     goals. Another direction of future work is to add support
make their output data available to the world, which the world                    for collaboration among agents, so it becomes possible to
uses to update its input data. After this, all agent execute their                specify goals that span across groups of agents, rather than
step function. Once this is done, the world publishes its output                  only specifying goals that apply to individual agents.
data, and makes it available as the input data for all agents. The
computation of the system’s new current state is now finished,                                                  R EFERENCES
and all that remains is to use the reward function of each goal                   [1] Paolo Costa, Luca Mottola, Amy L Murphy, and Gian Pietro Picco.
to compute how effective its chosen action was.                                       Teenylime: transiently shared tuple space middleware for wireless sensor
                                                                                      networks. In Proceedings of the international workshop on Middleware
                                                                                      for sensor networks, pages 43–48. ACM, 2006.
                                                                                  [2] Nate Foster, Rob Harrison, Michael J Freedman, Christopher Monsanto,
   3 Alternatively, it also is possible to write your own function that chooses       Jennifer Rexford, Alec Story, and David Walker. Frenetic: A network
which behaviour is executed, rather than letting the machine learning algo-           programming language. In ACM Sigplan Notices, volume 46, pages 279–
rithm choose.                                                                         291. ACM, 2011.


                                                                                                                                                            42
[3] Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore.
    Reinforcement learning: A survey. Journal of artificial intelligence
    research, 4:237–285, 1996.
[4] Emre Can Kara, Mario Berges, Bruce Krogh, and Soummya Kar. Using
    smart devices for system-level management and control in the smart grid:
    A reinforcement learning framework. In Smart Grid Communications
    (SmartGridComm), 2012 IEEE Third International Conference on, pages
    85–90. IEEE, 2012.
[5] Samuel R Madden, Michael J Franklin, Joseph M Hellerstein, and Wei
    Hong. Tinydb: an acquisitional query processing system for sensor
    networks. ACM Transactions on database systems (TODS), 30(1):122–
    173, 2005.
[6] Andreas Voellmy and Paul Hudak. Nettle: Taking the sting out of pro-
    gramming network routers. Practical Aspects of Declarative Languages,
    pages 235–249, 2011.
[7] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine
    learning, 8(3-4):279–292, 1992.
[8] Kamin Whitehouse, Feng Zhao, and Jie Liu. Semantic streams: A frame-
    work for composable semantic interpretation of sensor data. Wireless
    Sensor Networks, pages 5–20, 2006.


                                                                               43