=Paper= {{Paper |id=Vol-1782/paper_7 |storemode=property |title=The Use of Data Packets in a Behaviour Network to Improve the Action Selection Mechanism |pdfUrl=https://ceur-ws.org/Vol-1782/paper_7.pdf |volume=Vol-1782 |authors=Brian Peach,Peter Robinson |dblpUrl=https://dblp.org/rec/conf/plansig/PeachR16 }} ==The Use of Data Packets in a Behaviour Network to Improve the Action Selection Mechanism== https://ceur-ws.org/Vol-1782/paper_7.pdf
  The use of Data Packets in a Behaviour Network to improve the Action
                                                    Selection Mechanism
                                                  Brian Peach1, Peter Robinson2
                                   Department of Computer Science, University of Hull, Hull, United Kingdom
                                                1
                                                  b.peach@hull.ac.uk, 2p.a.robinson@hull.ac.uk




                             Abstract                                     metadata into the data packets it is possible to solve the lim-
  This paper presents a novel approach in the field of behaviour          itation outlined above.
  networks, using data packets to traverse a behaviour network               The rest of this paper is organised as follows. Section 2
  enabling an agent to more accurately select appropriate be-             presents background and related work for behaviour net-
  haviours. Behaviour networks are used as an action selection
                                                                          works. Section 3 describes the proposed data packet concept
  mechanism to allow agents to select the most appropriate be-
  haviour in a given situation. Behaviour networks incorporate            in detail. Section 4 shows the results of the experiments of
  an energy spreading mechanism that allows it to determine               the proposed solution. The Final section presents a summary
  which behaviours to execute. This research demonstrates                 and future works.
  that there are better methods for spreading activation energy
  around an action selection mechanism (ASM) and this is
  shown by implementing data packets to give an accurate se-                                     Background
  lection of behaviours in a behaviour network.
                                                                          In the early 90s Maes proposed the Agent Network Archi-
                                                                          tecture (ANA) (Maes 1991c; Maes 1991b; Maes 1991a) as
                         Introduction                                     a planning mechanism to enable the selection of the most
For many years’ robots have been developed to solve a va-                 appropriate behaviour from a range of alternatives. This
riety of tasks (Brooks 1986; Min & Cho 2010; Dorer 1999;                  mechanism is based on the idea of a notional 'energy', which
Paikan et al. 2013; Petrick & Foster 2013). Each of these has             is allocated to a goal and spreads through a connection net-
applied a different control architecture to enable robotic per-           work to possible actions. Behaviours may contribute to-
formance of various tasks, but all have struggled to support              wards goals or inhibit their attainment; the combination of
flexibility in the face of dynamic environments. One mech-                the network connectivity and the energy spreading mecha-
anism which may help in this regard is the behaviour net-                 nism determines which behaviour is preferred at any given
work, an action selection mechanism that can be used in                   time. This mechanism encapsulates a combination of reac-
control architectures (Lee & Cho 2014) to enable agents to                tive and planned behaviour selection, and is well suited to
select appropriate actions in changing situations.                        dynamic environments.
   A behaviour network typically consists of a directed                      The behaviour network is defined by a collection of links
graph representing the behaviours that an agent can perform               amongst nodes representing behaviours, goals and the envi-
and the restrictions upon them. Depending on the situation,               ronment. Energy is added to the network by applying it to
considering both goals and sensor data, such a network will               goal nodes or environment nodes, which encapsulate obser-
reactively select the best behaviour for the prevailing con-              vations that the agents make of the environment. Figure 1
straints, though with certain limitations (Tyrrell 1994).                 (Tyrrell 1994) shows the components of a behaviour node,
   This work aims to address these limitations with the in-               and the types of link which connect to it. Each link allows
troduction of quantization and recording of the energy used               the transfer of energy to or from other nodes.
to evaluate the network. The idea is to send data packets                    Each behaviour contains a precondition list, consisting of
through the behaviour network to transport the necessary en-              a list of propositions that must be true in order for the be-
ergy from one behaviour to another. By embedding                          haviour to be executable. Behaviours also have an 'add list'
                                                                          of propositions that the behaviour makes true once it has


huihhiuhuihiuhiuhiuhiuhiuhiuhiuu
been executed. Finally; the behaviours have a 'delete list' of    energy in each node, while conflictor links will inhibit the
propositions that the behaviour makes false once it has been      spreading of the activation energy.
executed.                                                            When the energy spreading mechanism is complete, the
                                                                  average energy of all nodes is normalised, and nodes with
                                                                  energy levels above a defined threshold are selected as can-
                                                                  didates for activation. At any given time, the node with the
                                                                  greatest energy represents the preferred behaviour.
                                                                     The original rules and formulas for each of the link types
                                                                  were reviewed and improved by (Tyrrell 1993; Tyrrell
                                                                  1994) and then extended by (Decugis & Ferber 1998) who
                                                                  claimed to have solved some of the problems found in the
                                                                  original behaviour network (such as deadlock between mul-
                                                                  tiple conflicting goals and improving reactivity between ac-
                                                                  tions).
 Figure 1. The components of a behaviour (Brooks 1986, Maes
                    1991, Tyrrell 1994)

   The predecessor links in the network connect two behav-
iour nodes when; a proposition is false, the proposition is in
the precondition list of node A and the proposition is in the
add list of node B. So node B (once executed) can help node
A become executable, will create an active predecessor link
from node A to node B.
   The successor links in the network connect two behaviour
nodes when; a proposition is false, the proposition is in the
add list of node A and the proposition is in the precondition
list of node B. So node A (once executed) can help node B
become executable, will create an active successor link from
node A to node B.                                                            Figure 2. An example behaviour network
   A conflictor link in the network inhibits a behaviour node
when; a proposition is true, the proposition is in the precon-    An example behaviour network is shown in Figure 2. It
dition list of node A and the proposition is in the delete list   shows two goals and two environment nodes connected to
of node B. So node B will prevent node A from been exe-           eight different behaviours forming a behaviour network. A
cutable and will create an active conflictor link from node A     more complex, real world situation is described as a behav-
to node B.                                                        iour network in Figure 4.
   The goal links in the network join behaviour nodes with           The behaviour network was designed to work in dynamic
goal nodes when; the activation energy in a goal is more than     environments and is therefore more reactive than traditional
zero and goal Y is in the add list of node A. So node A is        deliberative planning mechanisms. More recent work has
able to achieve goal Y.                                           been done to combine both these types to create a hybrid
   The environment links in the network join behaviour            system (Lee & Cho 2014), They combined the behaviour
nodes with environment nodes when; a proposition is true          network with a planning mechanism, to plan the sequence
and the proposition is in the precondition list of node A. So     of actions that could also be adjusted according to excep-
node A is relevant to the given situation.                        tions and environmental changes.
   To begin the network evaluation process, the external             Behaviour networks have not received a lot of interest in
nodes (goal and environment nodes) pass energy into the           recent years as other techniques in AI have become more
network according to formula defined rules (Tyrrell 1993;         active and established in the field of controlling robot be-
Tyrrell 1994). Once activation energy has been added to the       haviour. The motivation for returning to the technique in this
behaviour nodes of the network from the external sources,         work stems from a larger project to assist robots in complex
then the energy is spread to identify the ideal action. The       and dynamic environments.
amount of energy that is spread from one node to another is          During the implementation process of the behaviour net-
a defined percentage of that node's activation energy. Each       work, some key issues were noted that have not been raised
of the nodes in the behaviour network are connected by a          in any of the supporting text (Maes 1991a; Tyrrell 1993;
combination of predecessor, successor and conflictor links.       Tyrrell 1994). These issues revolve around the process of
Predecessor and Successor links will increase the amount of
spreading energy around the network. (Tyrrell 1993) ex-          makes the resulting energy spread depend on the order of
plains that energy must first enter the network via the goal     evaluation of the nodes in the network rather than its struc-
nodes and the environment nodes. The energy is passed be-        ture.
tween the nodes via the predecessor, successor and conflic-         Further, if the energy spreading mechanism is applied it-
tor links. However; there is little discussion on the order in   eratively, as Maes suggests, the resultant energy distribution
which the behaviour nodes should pass energy between             varies with iteration, and does not necessarily converge. De-
themselves, and this leads to a variety of problems in more      pending on the number of iterations of the energy spreading
complex networks.                                                mechanism, the behaviour network may select different be-
   For any given network there may be many possible op-          haviours to execute.
tions for the order in which the energy should be spread. De-       A variety of tests were performed on a behaviour network
pending on the order of evaluation of the behaviour nodes,       (Figure 2), starting with a random order for which to select
for the energy spreading process, the energy allocated to        behaviours and spread energy. A sample of the results from
specific behaviours after each iteration can vary. This incon-   this test in shown in Table 1.
sistent distribution of energy in the network, could be caused
by selecting different start nodes, nodes not receiving all      After 1 it-   After 10 it-    After 100 it-     After 1000 it-
their energy before distributing their own and the possibili-    eration       erations        erations          erations
ties of feedback loops between the behaviours.                   B1 (0)        B1 (0)          B1 (0)            B1 (0)
                                                                 B2 (0)        B2 (0)          B2 (0)            B2 (0)
                                                                 B3 (0)        B3 (0)          B3 (0)            B3 (0)
                         Method
                                                                 B4 (0)        B4 (0)          B4 (0)            B4 (0)
Each of the methods that were used during the implementa-        B5 (3.59)     B5 (3.1)        B5 (3.07)         B5 (3.07)
tion of a behaviour network adopted a general set of rules       B6 (2.93)     B6 (3.41)       B6 (3.44)         B6 (3.45)
prior to selecting the order that the behaviours should spread   B7 (2.89)     B7 (2.89)       B7 (2.89)         B7 (2.89)
the energy between themselves. The external nodes, such as       B8 (0.00)     B8 (0.00)       B8 (0.00)         B8 (0.00)
the goals and environment nodes, would spread energy into
the network first before the internal behaviours could spread     Table 1. Results from using a random order of behaviour nodes.
energy. This was to ensure that there was sufficient energy         The results in Table 1 differed each time the simulation
in the network to begin with. Links augmenting other nodes'      was executed. Proving that; not only is order that the energy
energy levels (predecessor and successor links) were se-         spread important, but also that an uneven distribution of en-
lected before inhibition nodes (conflictor links) again to en-   ergy exists in the network. An alternative method for select-
sure that there was energy in the nodes prior to spreading the   ing the order to spread the energy was then tested, based on
energy.                                                          a depth first approach, however; this then led into issues
   The early stages of implementation revealed a variety of      with potential feedback loops existing in the network. The
different problems, relating to the mechanism of energy          previously suggested methods for spreading energy through
spreading and the consistency of its results. One such issue     the network are clearly inadequate for more complex net-
is possibility of a cyclical structure within the network (as    works, each leading to an undesirable distribution of energy
shown in Figure 2, for example, prevents a clear definition      in the behaviour network.
of completion of energy spreading, since each node in the           We propose a new mechanism to allow the behaviour net-
cycle passes energy to its successors, which in turn pass en-    work to evenly distribute energy around the network regard-
ergy back. Such loops must be interrupted arbitrarily, and       less of the order of evaluation or the number of iterations:
the resulting energy at each node depends upon the point of      the use of data packets.
interruption in the cycle (which is not desired).
   Even if a given network does not contain cycles, the re-
sults can depend on the order of evaluation of links. For                               Data Packets
example, energy can be spread from each connected behav-
                                                                 The main problem of instability in the behaviour network
iour in sequence following the links from each node in a
                                                                 occurs for two reasons. Firstly, for a node to correctly dis-
depth-first search pattern, following the links until the end
                                                                 tribute energy to its successors, it must first have received
of the tree is reached and then back tracking to an unex-
                                                                 its full complement from all its predecessors. Whether or
plored node. This mechanism is simple, but has the flaw that
                                                                 not this has happened is hard to determine, and depends
the system does not know whether a behaviour has received
                                                                 upon the order of evaluation of the nodes. Secondly, if the
all of its inputs before sending energy to the next node. This
                                                                 network contains cycles, whether or not a given node has
received all its input energy remains undefined because                    nodes which have contributed to its energy, it can be pre-
some of the energy it emits will return as an input.                       vented from giving energy to a node from which it has al-
   The proposed solution to both of these issues is to treat               ready received it. There can therefore be no evaluation
the energy as packets with associated metadata rather than                 loops, and iteration becomes unnecessary.
simple values. When a behaviour needs to send its energy to
the next behaviour it will create a packet to send (Figure 3).
In addition to an energy value, each packet contains a list of                                       Results
all of the previous behaviours from which it has received                  The data packet approach was tested on a basic behaviour
energy, this permits a recipient behaviour to reject energy                network, containing a small number of arbitrary behaviours
which has originated from itself, or if required to prevents it            each connected via different types of links. Figure 4 shows
from sending more energy to a node which has already re-                   this test scenario in a behaviour network. Here there are two
ceived it.                                                                 goals a robot could have; ‘Explore’ will have the robot ex-
   An additional modification to behaviour nodes accommo-                  plore and learn about its environment and ‘Collect Yellow
dates the tracking of energy packets. Each behaviour now                   Blocks’ would have the robot find and move yellow blocks
stores not only its current energy level, but also, a list of all          to a different location. The goal ‘Explore’ can be achieved
of the data packets that it has received and a total energy                with a behaviour ‘Move’, which will have the robot move in
value which is a sum of its energy and the energy of all of                a random direction. The behaviours ‘Pick up Yellow Block’
the packets that it contains.                                              and ‘Put down Yellow Block’ allow the robot to interact
                                                                           with a yellow block to achieve the goal of ‘Collect Yellow
                                                                           Blocks’. Finally, ‘Pick up Blue Block’ is included allow for
                                                                           conflict in the system; this is a desirable action but one
                                                                           which prevents the robot performing others. The situation of
                                                                           the environment is that the robot is in front of a yellow block
                                                                           which is ready to be picked up, the robot's hand is empty and
                                                                           it is too far from a blue block to interact with it without mov-
        Figure 3. The contents of a behaviour and a packet                 ing. Using these conditions, the appropriate links are used to
                                                                           connect the behaviours, shown in Figure 4.
                                                                               The first implementation of the data packet approach fol-
    When a behaviour needs to send energy it will create a                 lowed the same approach as (Tyrrell 1994) where energy
packet and put a proportion of its own energy into the packet              was sent into the network first via the goal and environment
(using a formula based on the link type) (Tyrrell 1994). The               links. The predecessor links were then used to send energy
packet will then travel along the links of the same type (e.g.             around the network followed by successor links and con-
predecessor). When it arrives at a new behaviour the packet                cluding with inhibiting energy via the conflictor links. The
will be stored in that behaviours list of packages. The system             results from this implementation is shown in Table 2.
will then check if the behaviour that packet arrived at has an
outward link of the same type and if so, the behaviour will
create another packet and send a proportion of energy (from
the packet it just received) to the next behaviour. This pro-
cess of creating and sending packets will continue until no
packet can travel any further through the network.
    This approach to labelling and tracking energy in packet-
ized form allows much simpler and more consistent evalua-
tion of the network. Because at any given time a node's en-
ergy is stored as a list of component packets rather than a
single value, a node can spread only new energy each time
it is received without the duplication inherent in maintaining
only a single value. The order of evaluation of nodes ceased
to be of importance; all that is required is that each node is
evaluated once. Cycles in the network (which are valid log-
ical constructions, but make evaluation difficult) also cease              Figure 4. An example of a real world situation shown as a behav-
to be a problem; because each packet contains a list of the                                         iour network.


Copyright © 2015, Association for the Advancement of Artificial Intelli-
gence (www.aaai.org). All rights reserve
                                                                   multiple links are introduced the system needs to be modi-
 Behaviour                      Energy        Total Energy         fied.
 Put down Yellow Block          0             60                      The second implementation involved merging the energy
 Pick up Yellow Block           20            60                   in each of the packets with the remaining energy stored in
 Move                           30            30                   each behaviour for each type of link. For example; following
                                                                   from the previous example; behaviour 1 has sent 40 of its 50
 Pick up Blue Block             0             27.5
                                                                   energy to behaviour 2 via a predecessor link and it has re-
 Table 2. Results from using the packet approach with different    ceived a packet from another behaviour. Previously it would
                         types of links.                           create a packet using the remaining 10 energy it has and send
                                                                   that via a successor link, instead; before it creates a new
   The results in Table 2 show that the agent does not have
                                                                   packet it will merge the energy in its current list of packets
a clear choice for what it believes that the best action to take
                                                                   with the 10 energy it has remaining and send that proportion.
would be. The behaviours “Put down Yellow Block” and
                                                                   This concept will show that the data packets can now dis-
“Pick up Yellow Block” had equal highest energy even
                                                                   tribute an even amount of energy around the network.
though the “Put down Yellow Block” behaviour cannot be
executed given the situation of the robot. The reason for
                                                                             Behaviour                       Energy
these results is the order in which the energy is passed via
                                                                             Put down Yellow Block           120
the links. The results in Table 2 followed the order; prede-
cessor, successor and conflictor.                                            Move                            30
   The literature on behaviour networks states, that once the                Pick up Yellow Block            23.75
energy spreading mechanism has concluded, the system                         Pick up Blue Block              -8.75
should check for behaviours whose energy is greater than a         Table 3. Results from using the packet approach with merging us-
global threshold. If there are no matches then the energy                              ing different types of links.
spreading mechanism should be repeated until a behaviour
has surpassed the global threshold. It was then noted that            Table 3 shows the results from using this approach with a
there was a possibility for the “Move” behaviour to win out        link order of; predecessor, successor and conflictor. It shows
over the other behaviours, especially after multiple itera-        that the ‘Pick up Blue Block’ has a negative value, this value
tions of the energy spreading mechanism. This is because           is accurate as it is a behaviour that does not benefit the sys-
the “Move” behaviour is in a separate ‘system’ to the other        tem. The ‘Put down Yellow Block’ has the most energy with
behaviours, making it so it does not need to pass its energy       120, even though it is a behaviour that cannot be executed
to any other behaviour and that it will not be inhibited either.   based on the current situation that the robot is in. Finally;
This is a problem if the other ‘systems’ in the behaviour net-     the ‘Pick up Yellow Block’ has a value of 23.75, which
work are what we would like the robot to do, more than ex-         again based on the current situation is incorrect. The order
ploring the environment. To solve this the amount of energy        of the links was then changed to successor, predecessor and
passed from the goal ‘Explore’ would need to be reduced or         conflictor for the third implementation and the results are
have some other inhibition setting that could be applied to        shown in Table 4.
it. The method that the behaviours use to pass energy to
other behaviours is to send a proportion of energy based on                  Behaviour                      Energy
its current energy level. This could explain the unsatisfac-                 Pick up Yellow Block           46.25
tory results as when a behaviour sends energy to another be-                 Move                           30
haviour via a predecessor link, it is reducing the amount of                 Pick up Blue Block             13.75
energy that behaviour has. When that same behaviour then                     Put down Yellow Block          0
has to send energy via a different link type it may find that
                                                                   Table 4. Results from using the packet approach with merging us-
its energy level is lower than expected or empty. For exam-
                                                                                       ing different types of links.
ple; say behaviour 1 has an energy of 50 and it sends 40 en-
ergy to behaviour 2 via a predecessor link. Behaviour 1 then          Table 4 shows the results from using the packet approach
has 10 energy left for when it needs to send energy to be-         with energy merging following the link order of; successor,
haviour 3 via a successor link. However; behaviour 1 could         predecessor and conflictor. Here the ‘Pick up Blue Block’
have also received a packet of energy from another behav-          behaviour has a value of 13.75 which is low in comparison
iour making it so that behaviour 1 has 10 energy and a             to the values in other behaviours. This means that this be-
packet of 30 energy. The current solution of the system has        haviour would be unlikely to be chosen for activation. The
the behaviour only send energy from its own source and not         ‘Put down Yellow Block’ behaviour has a value of 0, which
from any packets it may have received. This would work             again is correct given the current situation the robot is in.
fine if there was only one type of link to consider but when       Finally; the ‘Pick up Yellow Block’ behaviour has the most
energy in that subsystem with a value of 46.25, making this                Maes, P., 1991c. The Agent Network Architecture ( A N A ). , 2(4),
the most likely behaviour to be selected. Overall these re-                pp.115–120.
sults show that the two executable behaviours were favoured                Min, H.J. & Cho, S.B., 2010. Adaptive behaviors of reactive mo-
                                                                           bile robot with Bayesian inference in nonstationary environments.
over the non-executable behaviours in the system.
                                                                           Applied Intelligence, 33(3), pp.264–277.
                                                                           Paikan, A., Metta, G. & Natale, L., 2013. A port-arbitrated mech-
                           Conclusion                                      anism for behavior selection in humanoid robotics. 2013 16th In-
                                                                           ternational Conference on Advanced Robotics, ICAR 2013.
Behaviour networks are useful action selection mechanisms                  Petrick, R.P. and Foster, M.E., 2013, June. Planning for Social In-
and are used in a variety of different control architectures.              teraction in a Robot Bartender Domain. In ICAPS.
As documented there are potential flaws in the original text               Tyrrell, T., 1994. An evaluation of Maes’s bottom-up mechanism
(Maes 1991a; Tyrrell 1994) whereby the order of the energy                 for behavior selection. Adaptive Behavior, 2(4), pp.307–348.
spreading mechanism greatly affects the outcome of the se-                 Tyrrell, T., 1993. Computational mechanisms for action selection.
lected behaviour for execution. This work presents a solu-                 , pp.1–218.
tion to this problem by introducing data packets to traverse
and distribute energy throughout a behaviour network. The
data packets have been shown to be able to distribute energy
throughout the network regardless of the order that the be-
haviours are selected to distribute energy in.
   The results show that the data packet approach is the op-
timum method for spreading energy around a behaviour net-
work accurately. The results show that the best method is to
implement the energy merging technique and to follow the
order of successor, predecessor and Conflictor for sending
packets through a behaviour network.
   The preliminary results presented in this paper form part
of a larger project to develop a dynamic behaviour network.
A system which can manipulate the behaviours in the be-
haviour network allowing an agent to work in complicated
and unstructured environments. The future work will be to
test the data packet methodology in more complicated and
challenging scenarios. This work will then be implemented
into the larger project of developing a dynamic behaviour
network.


                           References
Brooks, R.A., 1986. A Robust Layered Control System For A Mo-
bile Robot. IEEE Journal on Robotics and Automation, 2(1),
pp.14–23.
Decugis, V. & Ferber, J., 1998. An extension of Maes’ action se-
lection mechanism for animats. From animals to animats 5: Proc.
Fifth Int. …, (October), pp.1–8.
Dorer, K., 1999. Behavior networks for continuous domains using
situation-dependent motivations. IJCAI International Joint Confer-
ence on Artificial Intelligence, 2, pp.1233–1238.
Lee, Y.S. & Cho, S.B., 2014. A hybrid system of hierarchical plan-
ning of behaviour selection networks for mobile robot control. In-
ternational Journal of Advanced Robotic Systems, 11(1).
Maes, P., 1991a. A Bottom-up Mechanism For Behaviour Selec-
tion In An Artificial Creature. , pp.238–246.
Maes, P., 1991b. Learning to Coordinate Behaviours. Learning.



Copyright © 2015, Association for the Advancement of Artificial Intelli-
gence (www.aaai.org). All rights reserved.