=Paper=
{{Paper
|id=Vol-1782/paper_7
|storemode=property
|title=The Use of Data Packets in a Behaviour Network to Improve the Action Selection Mechanism
|pdfUrl=https://ceur-ws.org/Vol-1782/paper_7.pdf
|volume=Vol-1782
|authors=Brian Peach,Peter Robinson
|dblpUrl=https://dblp.org/rec/conf/plansig/PeachR16
}}
==The Use of Data Packets in a Behaviour Network to Improve the Action Selection Mechanism==
The use of Data Packets in a Behaviour Network to improve the Action Selection Mechanism Brian Peach1, Peter Robinson2 Department of Computer Science, University of Hull, Hull, United Kingdom 1 b.peach@hull.ac.uk, 2p.a.robinson@hull.ac.uk Abstract metadata into the data packets it is possible to solve the lim- This paper presents a novel approach in the field of behaviour itation outlined above. networks, using data packets to traverse a behaviour network The rest of this paper is organised as follows. Section 2 enabling an agent to more accurately select appropriate be- presents background and related work for behaviour net- haviours. Behaviour networks are used as an action selection works. Section 3 describes the proposed data packet concept mechanism to allow agents to select the most appropriate be- haviour in a given situation. Behaviour networks incorporate in detail. Section 4 shows the results of the experiments of an energy spreading mechanism that allows it to determine the proposed solution. The Final section presents a summary which behaviours to execute. This research demonstrates and future works. that there are better methods for spreading activation energy around an action selection mechanism (ASM) and this is shown by implementing data packets to give an accurate se- Background lection of behaviours in a behaviour network. In the early 90s Maes proposed the Agent Network Archi- tecture (ANA) (Maes 1991c; Maes 1991b; Maes 1991a) as Introduction a planning mechanism to enable the selection of the most For many years’ robots have been developed to solve a va- appropriate behaviour from a range of alternatives. This riety of tasks (Brooks 1986; Min & Cho 2010; Dorer 1999; mechanism is based on the idea of a notional 'energy', which Paikan et al. 2013; Petrick & Foster 2013). Each of these has is allocated to a goal and spreads through a connection net- applied a different control architecture to enable robotic per- work to possible actions. Behaviours may contribute to- formance of various tasks, but all have struggled to support wards goals or inhibit their attainment; the combination of flexibility in the face of dynamic environments. One mech- the network connectivity and the energy spreading mecha- anism which may help in this regard is the behaviour net- nism determines which behaviour is preferred at any given work, an action selection mechanism that can be used in time. This mechanism encapsulates a combination of reac- control architectures (Lee & Cho 2014) to enable agents to tive and planned behaviour selection, and is well suited to select appropriate actions in changing situations. dynamic environments. A behaviour network typically consists of a directed The behaviour network is defined by a collection of links graph representing the behaviours that an agent can perform amongst nodes representing behaviours, goals and the envi- and the restrictions upon them. Depending on the situation, ronment. Energy is added to the network by applying it to considering both goals and sensor data, such a network will goal nodes or environment nodes, which encapsulate obser- reactively select the best behaviour for the prevailing con- vations that the agents make of the environment. Figure 1 straints, though with certain limitations (Tyrrell 1994). (Tyrrell 1994) shows the components of a behaviour node, This work aims to address these limitations with the in- and the types of link which connect to it. Each link allows troduction of quantization and recording of the energy used the transfer of energy to or from other nodes. to evaluate the network. The idea is to send data packets Each behaviour contains a precondition list, consisting of through the behaviour network to transport the necessary en- a list of propositions that must be true in order for the be- ergy from one behaviour to another. By embedding haviour to be executable. Behaviours also have an 'add list' of propositions that the behaviour makes true once it has huihhiuhuihiuhiuhiuhiuhiuhiuhiuu been executed. Finally; the behaviours have a 'delete list' of energy in each node, while conflictor links will inhibit the propositions that the behaviour makes false once it has been spreading of the activation energy. executed. When the energy spreading mechanism is complete, the average energy of all nodes is normalised, and nodes with energy levels above a defined threshold are selected as can- didates for activation. At any given time, the node with the greatest energy represents the preferred behaviour. The original rules and formulas for each of the link types were reviewed and improved by (Tyrrell 1993; Tyrrell 1994) and then extended by (Decugis & Ferber 1998) who claimed to have solved some of the problems found in the original behaviour network (such as deadlock between mul- tiple conflicting goals and improving reactivity between ac- tions). Figure 1. The components of a behaviour (Brooks 1986, Maes 1991, Tyrrell 1994) The predecessor links in the network connect two behav- iour nodes when; a proposition is false, the proposition is in the precondition list of node A and the proposition is in the add list of node B. So node B (once executed) can help node A become executable, will create an active predecessor link from node A to node B. The successor links in the network connect two behaviour nodes when; a proposition is false, the proposition is in the add list of node A and the proposition is in the precondition list of node B. So node A (once executed) can help node B become executable, will create an active successor link from node A to node B. Figure 2. An example behaviour network A conflictor link in the network inhibits a behaviour node when; a proposition is true, the proposition is in the precon- An example behaviour network is shown in Figure 2. It dition list of node A and the proposition is in the delete list shows two goals and two environment nodes connected to of node B. So node B will prevent node A from been exe- eight different behaviours forming a behaviour network. A cutable and will create an active conflictor link from node A more complex, real world situation is described as a behav- to node B. iour network in Figure 4. The goal links in the network join behaviour nodes with The behaviour network was designed to work in dynamic goal nodes when; the activation energy in a goal is more than environments and is therefore more reactive than traditional zero and goal Y is in the add list of node A. So node A is deliberative planning mechanisms. More recent work has able to achieve goal Y. been done to combine both these types to create a hybrid The environment links in the network join behaviour system (Lee & Cho 2014), They combined the behaviour nodes with environment nodes when; a proposition is true network with a planning mechanism, to plan the sequence and the proposition is in the precondition list of node A. So of actions that could also be adjusted according to excep- node A is relevant to the given situation. tions and environmental changes. To begin the network evaluation process, the external Behaviour networks have not received a lot of interest in nodes (goal and environment nodes) pass energy into the recent years as other techniques in AI have become more network according to formula defined rules (Tyrrell 1993; active and established in the field of controlling robot be- Tyrrell 1994). Once activation energy has been added to the haviour. The motivation for returning to the technique in this behaviour nodes of the network from the external sources, work stems from a larger project to assist robots in complex then the energy is spread to identify the ideal action. The and dynamic environments. amount of energy that is spread from one node to another is During the implementation process of the behaviour net- a defined percentage of that node's activation energy. Each work, some key issues were noted that have not been raised of the nodes in the behaviour network are connected by a in any of the supporting text (Maes 1991a; Tyrrell 1993; combination of predecessor, successor and conflictor links. Tyrrell 1994). These issues revolve around the process of Predecessor and Successor links will increase the amount of spreading energy around the network. (Tyrrell 1993) ex- makes the resulting energy spread depend on the order of plains that energy must first enter the network via the goal evaluation of the nodes in the network rather than its struc- nodes and the environment nodes. The energy is passed be- ture. tween the nodes via the predecessor, successor and conflic- Further, if the energy spreading mechanism is applied it- tor links. However; there is little discussion on the order in eratively, as Maes suggests, the resultant energy distribution which the behaviour nodes should pass energy between varies with iteration, and does not necessarily converge. De- themselves, and this leads to a variety of problems in more pending on the number of iterations of the energy spreading complex networks. mechanism, the behaviour network may select different be- For any given network there may be many possible op- haviours to execute. tions for the order in which the energy should be spread. De- A variety of tests were performed on a behaviour network pending on the order of evaluation of the behaviour nodes, (Figure 2), starting with a random order for which to select for the energy spreading process, the energy allocated to behaviours and spread energy. A sample of the results from specific behaviours after each iteration can vary. This incon- this test in shown in Table 1. sistent distribution of energy in the network, could be caused by selecting different start nodes, nodes not receiving all After 1 it- After 10 it- After 100 it- After 1000 it- their energy before distributing their own and the possibili- eration erations erations erations ties of feedback loops between the behaviours. B1 (0) B1 (0) B1 (0) B1 (0) B2 (0) B2 (0) B2 (0) B2 (0) B3 (0) B3 (0) B3 (0) B3 (0) Method B4 (0) B4 (0) B4 (0) B4 (0) Each of the methods that were used during the implementa- B5 (3.59) B5 (3.1) B5 (3.07) B5 (3.07) tion of a behaviour network adopted a general set of rules B6 (2.93) B6 (3.41) B6 (3.44) B6 (3.45) prior to selecting the order that the behaviours should spread B7 (2.89) B7 (2.89) B7 (2.89) B7 (2.89) the energy between themselves. The external nodes, such as B8 (0.00) B8 (0.00) B8 (0.00) B8 (0.00) the goals and environment nodes, would spread energy into the network first before the internal behaviours could spread Table 1. Results from using a random order of behaviour nodes. energy. This was to ensure that there was sufficient energy The results in Table 1 differed each time the simulation in the network to begin with. Links augmenting other nodes' was executed. Proving that; not only is order that the energy energy levels (predecessor and successor links) were se- spread important, but also that an uneven distribution of en- lected before inhibition nodes (conflictor links) again to en- ergy exists in the network. An alternative method for select- sure that there was energy in the nodes prior to spreading the ing the order to spread the energy was then tested, based on energy. a depth first approach, however; this then led into issues The early stages of implementation revealed a variety of with potential feedback loops existing in the network. The different problems, relating to the mechanism of energy previously suggested methods for spreading energy through spreading and the consistency of its results. One such issue the network are clearly inadequate for more complex net- is possibility of a cyclical structure within the network (as works, each leading to an undesirable distribution of energy shown in Figure 2, for example, prevents a clear definition in the behaviour network. of completion of energy spreading, since each node in the We propose a new mechanism to allow the behaviour net- cycle passes energy to its successors, which in turn pass en- work to evenly distribute energy around the network regard- ergy back. Such loops must be interrupted arbitrarily, and less of the order of evaluation or the number of iterations: the resulting energy at each node depends upon the point of the use of data packets. interruption in the cycle (which is not desired). Even if a given network does not contain cycles, the re- sults can depend on the order of evaluation of links. For Data Packets example, energy can be spread from each connected behav- The main problem of instability in the behaviour network iour in sequence following the links from each node in a occurs for two reasons. Firstly, for a node to correctly dis- depth-first search pattern, following the links until the end tribute energy to its successors, it must first have received of the tree is reached and then back tracking to an unex- its full complement from all its predecessors. Whether or plored node. This mechanism is simple, but has the flaw that not this has happened is hard to determine, and depends the system does not know whether a behaviour has received upon the order of evaluation of the nodes. Secondly, if the all of its inputs before sending energy to the next node. This network contains cycles, whether or not a given node has received all its input energy remains undefined because nodes which have contributed to its energy, it can be pre- some of the energy it emits will return as an input. vented from giving energy to a node from which it has al- The proposed solution to both of these issues is to treat ready received it. There can therefore be no evaluation the energy as packets with associated metadata rather than loops, and iteration becomes unnecessary. simple values. When a behaviour needs to send its energy to the next behaviour it will create a packet to send (Figure 3). In addition to an energy value, each packet contains a list of Results all of the previous behaviours from which it has received The data packet approach was tested on a basic behaviour energy, this permits a recipient behaviour to reject energy network, containing a small number of arbitrary behaviours which has originated from itself, or if required to prevents it each connected via different types of links. Figure 4 shows from sending more energy to a node which has already re- this test scenario in a behaviour network. Here there are two ceived it. goals a robot could have; ‘Explore’ will have the robot ex- An additional modification to behaviour nodes accommo- plore and learn about its environment and ‘Collect Yellow dates the tracking of energy packets. Each behaviour now Blocks’ would have the robot find and move yellow blocks stores not only its current energy level, but also, a list of all to a different location. The goal ‘Explore’ can be achieved of the data packets that it has received and a total energy with a behaviour ‘Move’, which will have the robot move in value which is a sum of its energy and the energy of all of a random direction. The behaviours ‘Pick up Yellow Block’ the packets that it contains. and ‘Put down Yellow Block’ allow the robot to interact with a yellow block to achieve the goal of ‘Collect Yellow Blocks’. Finally, ‘Pick up Blue Block’ is included allow for conflict in the system; this is a desirable action but one which prevents the robot performing others. The situation of the environment is that the robot is in front of a yellow block which is ready to be picked up, the robot's hand is empty and it is too far from a blue block to interact with it without mov- Figure 3. The contents of a behaviour and a packet ing. Using these conditions, the appropriate links are used to connect the behaviours, shown in Figure 4. The first implementation of the data packet approach fol- When a behaviour needs to send energy it will create a lowed the same approach as (Tyrrell 1994) where energy packet and put a proportion of its own energy into the packet was sent into the network first via the goal and environment (using a formula based on the link type) (Tyrrell 1994). The links. The predecessor links were then used to send energy packet will then travel along the links of the same type (e.g. around the network followed by successor links and con- predecessor). When it arrives at a new behaviour the packet cluding with inhibiting energy via the conflictor links. The will be stored in that behaviours list of packages. The system results from this implementation is shown in Table 2. will then check if the behaviour that packet arrived at has an outward link of the same type and if so, the behaviour will create another packet and send a proportion of energy (from the packet it just received) to the next behaviour. This pro- cess of creating and sending packets will continue until no packet can travel any further through the network. This approach to labelling and tracking energy in packet- ized form allows much simpler and more consistent evalua- tion of the network. Because at any given time a node's en- ergy is stored as a list of component packets rather than a single value, a node can spread only new energy each time it is received without the duplication inherent in maintaining only a single value. The order of evaluation of nodes ceased to be of importance; all that is required is that each node is evaluated once. Cycles in the network (which are valid log- ical constructions, but make evaluation difficult) also cease Figure 4. An example of a real world situation shown as a behav- to be a problem; because each packet contains a list of the iour network. Copyright © 2015, Association for the Advancement of Artificial Intelli- gence (www.aaai.org). All rights reserve multiple links are introduced the system needs to be modi- Behaviour Energy Total Energy fied. Put down Yellow Block 0 60 The second implementation involved merging the energy Pick up Yellow Block 20 60 in each of the packets with the remaining energy stored in Move 30 30 each behaviour for each type of link. For example; following from the previous example; behaviour 1 has sent 40 of its 50 Pick up Blue Block 0 27.5 energy to behaviour 2 via a predecessor link and it has re- Table 2. Results from using the packet approach with different ceived a packet from another behaviour. Previously it would types of links. create a packet using the remaining 10 energy it has and send that via a successor link, instead; before it creates a new The results in Table 2 show that the agent does not have packet it will merge the energy in its current list of packets a clear choice for what it believes that the best action to take with the 10 energy it has remaining and send that proportion. would be. The behaviours “Put down Yellow Block” and This concept will show that the data packets can now dis- “Pick up Yellow Block” had equal highest energy even tribute an even amount of energy around the network. though the “Put down Yellow Block” behaviour cannot be executed given the situation of the robot. The reason for Behaviour Energy these results is the order in which the energy is passed via Put down Yellow Block 120 the links. The results in Table 2 followed the order; prede- cessor, successor and conflictor. Move 30 The literature on behaviour networks states, that once the Pick up Yellow Block 23.75 energy spreading mechanism has concluded, the system Pick up Blue Block -8.75 should check for behaviours whose energy is greater than a Table 3. Results from using the packet approach with merging us- global threshold. If there are no matches then the energy ing different types of links. spreading mechanism should be repeated until a behaviour has surpassed the global threshold. It was then noted that Table 3 shows the results from using this approach with a there was a possibility for the “Move” behaviour to win out link order of; predecessor, successor and conflictor. It shows over the other behaviours, especially after multiple itera- that the ‘Pick up Blue Block’ has a negative value, this value tions of the energy spreading mechanism. This is because is accurate as it is a behaviour that does not benefit the sys- the “Move” behaviour is in a separate ‘system’ to the other tem. The ‘Put down Yellow Block’ has the most energy with behaviours, making it so it does not need to pass its energy 120, even though it is a behaviour that cannot be executed to any other behaviour and that it will not be inhibited either. based on the current situation that the robot is in. Finally; This is a problem if the other ‘systems’ in the behaviour net- the ‘Pick up Yellow Block’ has a value of 23.75, which work are what we would like the robot to do, more than ex- again based on the current situation is incorrect. The order ploring the environment. To solve this the amount of energy of the links was then changed to successor, predecessor and passed from the goal ‘Explore’ would need to be reduced or conflictor for the third implementation and the results are have some other inhibition setting that could be applied to shown in Table 4. it. The method that the behaviours use to pass energy to other behaviours is to send a proportion of energy based on Behaviour Energy its current energy level. This could explain the unsatisfac- Pick up Yellow Block 46.25 tory results as when a behaviour sends energy to another be- Move 30 haviour via a predecessor link, it is reducing the amount of Pick up Blue Block 13.75 energy that behaviour has. When that same behaviour then Put down Yellow Block 0 has to send energy via a different link type it may find that Table 4. Results from using the packet approach with merging us- its energy level is lower than expected or empty. For exam- ing different types of links. ple; say behaviour 1 has an energy of 50 and it sends 40 en- ergy to behaviour 2 via a predecessor link. Behaviour 1 then Table 4 shows the results from using the packet approach has 10 energy left for when it needs to send energy to be- with energy merging following the link order of; successor, haviour 3 via a successor link. However; behaviour 1 could predecessor and conflictor. Here the ‘Pick up Blue Block’ have also received a packet of energy from another behav- behaviour has a value of 13.75 which is low in comparison iour making it so that behaviour 1 has 10 energy and a to the values in other behaviours. This means that this be- packet of 30 energy. The current solution of the system has haviour would be unlikely to be chosen for activation. The the behaviour only send energy from its own source and not ‘Put down Yellow Block’ behaviour has a value of 0, which from any packets it may have received. This would work again is correct given the current situation the robot is in. fine if there was only one type of link to consider but when Finally; the ‘Pick up Yellow Block’ behaviour has the most energy in that subsystem with a value of 46.25, making this Maes, P., 1991c. The Agent Network Architecture ( A N A ). , 2(4), the most likely behaviour to be selected. Overall these re- pp.115–120. sults show that the two executable behaviours were favoured Min, H.J. & Cho, S.B., 2010. Adaptive behaviors of reactive mo- bile robot with Bayesian inference in nonstationary environments. over the non-executable behaviours in the system. Applied Intelligence, 33(3), pp.264–277. Paikan, A., Metta, G. & Natale, L., 2013. A port-arbitrated mech- Conclusion anism for behavior selection in humanoid robotics. 2013 16th In- ternational Conference on Advanced Robotics, ICAR 2013. Behaviour networks are useful action selection mechanisms Petrick, R.P. and Foster, M.E., 2013, June. Planning for Social In- and are used in a variety of different control architectures. teraction in a Robot Bartender Domain. In ICAPS. As documented there are potential flaws in the original text Tyrrell, T., 1994. An evaluation of Maes’s bottom-up mechanism (Maes 1991a; Tyrrell 1994) whereby the order of the energy for behavior selection. Adaptive Behavior, 2(4), pp.307–348. spreading mechanism greatly affects the outcome of the se- Tyrrell, T., 1993. Computational mechanisms for action selection. lected behaviour for execution. This work presents a solu- , pp.1–218. tion to this problem by introducing data packets to traverse and distribute energy throughout a behaviour network. The data packets have been shown to be able to distribute energy throughout the network regardless of the order that the be- haviours are selected to distribute energy in. The results show that the data packet approach is the op- timum method for spreading energy around a behaviour net- work accurately. The results show that the best method is to implement the energy merging technique and to follow the order of successor, predecessor and Conflictor for sending packets through a behaviour network. The preliminary results presented in this paper form part of a larger project to develop a dynamic behaviour network. A system which can manipulate the behaviours in the be- haviour network allowing an agent to work in complicated and unstructured environments. The future work will be to test the data packet methodology in more complicated and challenging scenarios. This work will then be implemented into the larger project of developing a dynamic behaviour network. References Brooks, R.A., 1986. A Robust Layered Control System For A Mo- bile Robot. IEEE Journal on Robotics and Automation, 2(1), pp.14–23. Decugis, V. & Ferber, J., 1998. An extension of Maes’ action se- lection mechanism for animats. From animals to animats 5: Proc. Fifth Int. …, (October), pp.1–8. Dorer, K., 1999. Behavior networks for continuous domains using situation-dependent motivations. IJCAI International Joint Confer- ence on Artificial Intelligence, 2, pp.1233–1238. Lee, Y.S. & Cho, S.B., 2014. A hybrid system of hierarchical plan- ning of behaviour selection networks for mobile robot control. In- ternational Journal of Advanced Robotic Systems, 11(1). Maes, P., 1991a. A Bottom-up Mechanism For Behaviour Selec- tion In An Artificial Creature. , pp.238–246. Maes, P., 1991b. Learning to Coordinate Behaviours. Learning. Copyright © 2015, Association for the Advancement of Artificial Intelli- gence (www.aaai.org). All rights reserved.